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TRANSGENIC PLANTS EXPREiSSING PHOTORHABDUS TOXIN 

BACKGROUND OF THE INVENTION 
As reported in WO98/08932, protein toxins from the 
genus Photorhabdus have been shown to have oral toxicity 
5 against insects. The toxin complex produced by 

Photorhabdus luminescens (W-14), for example, has been 
shown to contain ten to fourteen proteins, and it is 
known that these are produced by expression of genes from 
four distinct genomic regions: tea, tcjb, tec, and ted. 
10 WO98/08932 discloses nucleotide sequences for the native 
toxin genes. 

Of the separate toxins isolated from Photorhabdus 
luminescens (W-14), those designated Toxin A and Toxin B 
are especially potent against target insect species of 

15 interest, for example corn rootworm. Toxin A is 

comprised of two different subunits. The native gene 
tcdA (SEQ ID NO:l) encodes protoxin TcdA (see SEQ ID 
NO:l). As determined by mass spectrometry, TcdA is. 
processed by one or more proteases to provide Toxin A- 

20 More specifically, TcdA is an approximately 282.9 kDA 

protein (2516 aa) that is processed to provide TcdAii, an 
approximately 208.2 kDA (1849 aa) protein encoded by 
nucleotides 265-5811 of SEQ ID N0:1, and TcdAiii, an 
approximately 63.5 kDA (579 aa) protein encoded by 

25 nucleotides 5812-7551 of SEQ ID N0:1. 

Toxin B is similarly comprised of two different 
subunits. The native gene tcbA (SEQ ID NO: 2) encodes 
protoxin TcbA (see SEQ ID N0:2). As determined by mass 
spectrometry, TcbA is processed by one or more proteases 

30 to provide Toxin B. More specifically, TcbA is an 
approximately 280.6 kDA (2504 aa) protein that is 
processed to provide TcbAii, an approximately 207.7 kDA 
(1844 aa) protein encoded by nucleotides 262-5793 of SEQ 
ID NO: 2 and TcbAiii, an approximately 62.9 kDA (573 aa) 

35 protein encoded by nucleotides 5794-7512 of SEQ ID NO: 2. 
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The native tcdA and tcbA genes are not well suited 
for high level expression in plants. They encode 
multiple destabilization sequences, mRNA splice sites, 
polyA addition sites and other possibly detrimental 
5 sequence motifs- In addition, the codon compositions are 
not like those of plant genes. WO98/08932 gives general 
guidance on how the toxin genes could be reengineered to 
more efficiently expressed in the cytoplasm of plants, 
and describes how plants can be transformed to 
10 incorporate the Photorhabdus toxin genes into their 
genomes. 

SUMMARY OF THE INVENTION 
In a preferred embodiment, the invention provides 
novel polynucleotide sequences that encode TcdA and TcbA. 

15 The novel sequences have base compositions that differ 
substantially from the native genes, making them more 
similar to plant genes. The new sequences are suitable 
for use for high expression in both monocots and dicots, 
and this feature is designated by referring to the 

20 sequences as the "hemicot" criteria, which is set forth 
in detail hereinafter. Other important features of the 
sequences are that potentially deleterious sequences have 
been eliminated, and unique restriction sites have been 
built in to enable adding or changing expression 

25 elements, organellar targeting signals, engineered 
protease sites" and the like, if desired. 

In a particularly preferred embodiment, the 
invention provides polynucleotide sequences that satisfy 
hemicot criteria and that comprise a sequence encoding an 

30 endoplasmic reticulum signal or similar targeting 

sequence for a cellular organelle in combination with a 
sequence encoding TcdA or TdbA. 

More broadly, the invention provides engineered 
nucleic acids encoding functional Photorhabdus toxins 

35 wherein the sequences satisfy hemicot criteria. 

-2- 
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The invention also provides transgenic plants with 

genomes comprising a novel sequence of the invention that 

imparts functional activity against insects. 



5 BRIEF DESCRIPTION OF SEQUENCES 

SEQ ID N0:1 is the native tcdA DNA sequence together 
with the corresponding encoded amino acid sequence for 
TcdA. 

SEQ ID NO: 2 is the native tcbA DNA sequence together 
10 with the corresponding encoded amino acid sequence for 
TcbA. 

SEQ ID NO: 3 is an artificial sequence encoding TcdA 
that is suitable for expression in monocot and dicot 
plants. 

15 SEQ ID NO: 4 is an artificial sequence encoding TdbA 

that is suitable for expression in monocot and dicot 
plants . 

SEQ ID NO: 5 is an artificial hemicot sequence that 
encodes the 21 amino acid ER signal peptide of 15 kDa 

20 zein from Black Mexican Sweet maize. 

SEQ ID NO: 6 is an artificial hemicot sequence that 
encodes for the full-length native TcdA protein (amino 
acids 22-2537) fused to the modified 15 kDa zein 
endoplasmic reticulum signal peptide (amino acids 1-21) . 

25 DETAILED DESCRIPTION 

The native Photorhabdus toxins are protein complexes 
that are produced and secreted by growing bacteria cells 
of the genus Photorhabdus. Of particular interest are 
the proteins produced by the species Photorhabdus 

30 luminescens. The protein complexes have a molecular size 
of approximately 1,000 kDa and can be separated by SDS- 
PAGE gel analysis into numerous component proteins. The 
toxins contain no hemolysin, lipase, type C 
phospholipase, or nuclease activities. The toxins 

35 exhibit significant toxicity upon ingestion by a number 
of insects. 
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A unique feature of Photorhabdus is its 
bioluminescence. Photorhabdus may be isolated from a 
variety of sources. One such source is nematodes, more 
particularly nematodes of the genus Heterorhabditis . 
5 Another such source is from human clinical samples from 
wounds, see Farmer et al. 1989 J- Clin. Microbiol. 27 pp. 
1594-1600- These saprohytic strains are deposited in the 
American Type Culture Collection (Rockville, MD) ATCC #s 
43948, 43949, 43950, 43951, and 43952, and are 

10 incorporated herein by reference. It is possible that 
other sources could harbor Photorhabdus bacteria that 
produce insecticidal toxins. Such sources in the 
environment could be either terrestrial or aquatic based. 
The genus Photorhabdus is taxonomically defined as a 

15 member of the Family Enterobacteriaceae, although it has 
certain traits atypical of this family. For example, 
strains of this genus are nitrate reduction negative, 
yellow and red pigment producing and bioiuminescent . 
This latter trait is otherwise unknown within the 

20 Enterobacteriaceae. Photorhabdus has only recently been 
described as a genus separate from the Xenorhabdus 
(Boemare et al., 1993 Int. J. Syst. Bacterid. 43, 249- 
255) . This differentiation is based on DNA-DNA 
hybridization studies, phenotypic differences (e.g., 

25 presence {Photorhabdus) or absence {Xenorhabdus) of 
catalase and bioluminescence) and the Family of the 
nematode host {Xenorhabdus; Steinernematidae, 
Photorhabdus; Heterorhabditidae) . Comparative, cellular 
fatty-acid analyses (Janse et al. 1990, Lett. Appl. 

30 Microbiol 10, 131-135; Suzuki et al. 1990, J. Gen. Appl. 
Microbiol., 36, 393-401) support the separation of 
Photorhabdus from Xenorhabdus . 

Currently, the bacterial genus Photorhabdus is 
comprised of a single defined species, Photorhabdus 

35 luminescens (ATCC Type strain #29999, Poinar et al., 
1977, Nematologica 23, 97-102). A variety of related 
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Strains have been described in the literature (e.g., 
Akhurst et al. 1988 J. Gen. Microbiol., 134, 1835-1845; 
Boemare et al. 1993 Int. J. Syst. Bacteriol. 43 pp. 249- 
255; Putz et al. 1990, Appl. Environ. Microbiol., 56, 
5 181-186) . 

The following toxin producing Photorhajbdus strains 
have been deposited: 



.0U1029A1 Jl> 



-5- 



wo 01/11029 



PCT/USOO/22237 



strain 


accession number 


date of deposit 


W-14 


ATCC 55397 


March 5, lyyj 


WXl 


NRRL 8-21710 


April 2y , lyy / 


WX2 


NRRL B-2i711 


April 29, 1997 


WX3 


NRRL B-21712 


April 29, 1997 


WX4 


NRRL B-21713 


April 29, 1997 


WX5 


NRRL B-21714 


April 29, 1997 


WX6 


NRRL B-21715 


April 29, 1997 


WX7 


NRRL B-21716 


April 29, 1997 


WX8 


NRRL B-21717 


April 29, 1997 


WX9 


NRRL B-21718 


April 29, 1997 


WXIO 


NRRL B-21719 


April 29, 1997 


WXll 


NRRL B-21720 


April 29, 1997 


WX12 


NRRL B-21721 


April 29, 1997 


WXl 4 


NRRL B-21722 


April 29, 1997 


WX15 


NRRL B-21723 


Tk.-^^^l OO TOOT 

April 29, 1997 


H9 


NRRL B-21727 


April 29, 1997 


Hb 


NRRL B-21726 


April 29, 1997 


Hm 


NRRL B-21725 


April 29, 1997 


HP88 


NRRL B-21724 


April 29, 1997 


NC-1 


NRRL B-21728 


April 29, 1997 


W30 


NRRL B-21729 


April 29, 1997 


WIR 


NRRL B-21730 


April 29, 1997 


B2 


NRRL B-21731 


April 29, 1997 


ATCC 4 3948 


ATCC 55878 


November 5, 1996 


ATCC 4 3949 


ATCC 55879 


November 5, 1996 


ATCC 4 3950 


ATCC 55880 


November 5, 1996 


ATCC 53951 


ATCC 55881 


November 5, 1996 


ATCC 43952 


ATCC 55882 


November 5, 1996 


DEPI 


NRRL B-21707 


April 29, 1997 


DEP2 


NRRL B-21708 


April 29, 1997 


DEP3 


NRRL B-21709 


April 29, 1997 


P. zealandrica 


NRRL B-21683 


April 29, 1997 


P. hepiaius 


NRRL B-21684 


April 29, 1997 


HB-Arg 


NRRL B-21685 


April 29, 1997 


HB Oswego 


NRRL B-21686 


April 29, 1997 


Hb Lewiston 


NRRL B-21687 


April 29, 1997 


K-122 


NRRL B-21688 


April 29, 1997 


HMGD 


NRRL B-21689 


April 29, 1997 


Indicus 


NRRL B-21690 


April 29, 1997 


GD 


NRRL B-21691 


April 29, 1997 


PWH-5 


NRRL B-21692 


April 29, 1997 


Megidis 


NRRL B-21693 


April 29, 1997 


HF-85 


NRRL B-21694 


April 29, 1997 


A. Cows 


NRRL B-21695 


April 29, 1997 


MPl 


NRRL B-21696 


April 29, 1997 


MP2 


NRRL B-21697 


April 29, 1997 


MP3 


NRRL B-21698 


April 29, 1997 


MP4. 


NRRL B-21699 


April 29, 1997 


MP5 


MOOT D 

NRRL. o— /UU 


April ^-7, f 


GL98 


NRRL B-21701 


April 29, 1997 


GllOl 


NRRL B-21702 


April 29, 1997 


GL138 


NRRL B-21703 


April 29, 1997 


GL155 


NRRL B-21704 


April 29, 1997 


GL217 


NRRL B-21705 


April 29, 1997 


GL257 


NRRL B-21706 


April 29, 1997 



All strains were deposited in accordance with the 



terms of the Budapest Treaty, Strains having 
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accession numbers prefaced by ^^ATTC" were deposited 
on the indicated date in the American Type Culture 
Collection, 12301 Parklawn Drive, Rockville, MD 
20852 USA. Strains prefaced by "'NRRL'' were 
5 deposited on the indicated date in the Agricultural 
Research Service Patent Culture Collection (NRRL) , 
National Center for Agricultural Utilization 
Research, ARS-USDA, 1815 North University St., 
Peoria XL 61604 USA. 

10 The present invention provides hemicot nucleic acid 

sequences encoding toxins from any Photorhabdus species 
or strain that produces a toxin having functional 
activity. Hemicot nucleic acid sequences encoding 
proteins homologous to such toxins are also encompassed 

15 by the invention. 

Several terms that are used herein have a particular 
meaning and are defined as follows: 

By 'Afunctional activit/' it is meant herein that the 
protein toxins) function as insect control agents in that 

20 the proteins are orally active, or have a toxic effect, 

or are able to disrupt or deter feeding, which may or may 
not cause death of the insect. When an insect comes into 
contact with an effective amount of toxin delivered via 
transgenic plant expression, formulated protein 

25 compositions), sprayable protein compositions), a bait 
matrix or other delivery system, the results are- 
typically death of the insect, or the insects do not feed 
upon the source which makes the toxins available to the 
insects. 

30 By "^homolog" it is meant an amino acid sequence that 

is identified as possessing homology to a reference 
Photorhabdus toxin polypeptide amino acid sequence. 

By A' homology" it is meant an amino acid sequence 
that has a similarity index of at least 33% and/or an 
35 identity index of at least 26% to a reference 

Photorhabdus toxin polypeptide amino acid sequence, as 

-7- 
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scored by the GAP algorithm using the BlOsum 62 protein 
scoring matrix Wisconsin Package Version 9.0, Genetics 
Computer Group GCG) , Madison, WI) . 

By '"identity" is meant an amino acid sequence that 
5 contains an identical residue at a given position, 

following alignment with a reference Photrhabdus toxin 
polypeptide amino acid sequence by the GAP algorithm. 

By the use of the term Photo rhahdus toxin" it is 
meant any protein produced by a Photorhabdus 
10 microorganism strain which has functional activity 

against insects, where the Photorhabdus toxin could be 
formulated as a sprayable composition, expressed by a 
transgenic plant, formulated as a bait matrix, delivered 
via baculovirus, or delivered by any other applicable 
15 host or delivery system. 

By the use of the term toxic" or '"toxicity" as used 
herein it is meant that the toxins produced by 
Photorhabdus have "'functional activity" as defined 
herein. 

20 By "substantial sequence homology" is meant either: 

a DNA fragment having a nucleotide sequence sufficiently 
similar to another DNA fragment to produce a protein 
having similar biochemical properties; or a polypeptide 
having an amino acid sequence sufficiently similar to 

25 another polypeptide to exhibit similar biochemical 
properties. 

As with other bacterial toxins, the rate of mutation 
of the bacteria in a population causes many related 
toxins slightly different in sequence to exist. Toxins 

30 of interest here are those which produce protein 

complexes toxic to a variety of insects upon exposure, as 
described herein. Preferably, the toxins are active 
against Lepidoptera, Coleoptera, Homopotera, Diptera, 
Hymenoptera, Dictyoptera and Acarina. The inventions 

35 herein are intended to capture the protein toxins 

homologous to protein toxins produced by the strains 

-8- 
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herein and any derivative strains thereof, as well as any 
protein toxins produced by Photorhabdus . These 
homologous proteins may differ in sequence, but do not 
differ in function from those toxins described herein. 
5 Homologous toxins are meant to include protein complexes 
of between 300 kDa to 2,000 kDa and are comprised of at 
least two 2) subunits, where a subunit is a peptide which 
may or may not be the same as the other subunit. Various 
protein subunits have been identified and are taught in 

10 the Examples herein. Typically, the protein subunits are 
between about 18 kDa to about 230 kDa; between about 160 
kDa to about 230 kDa; 100 kDa to 160 kDa; about 80 kpa to 
about 100 kDa; and about 50 kDa to about 80 kDa. 

As discussed above, some Photorhabdus strains can be 

15 isolated from nematodes. Some nematodes, elongated 

cylindrical parasitic worms of the phylum Nematoda, have 
evolved an ability to exploit insect larvae as a favored 
growth environment. The insect larvae provide a source 
of food for growing nematodes and an environment in which 

20 to reproduce. One dramatic effect that follows invasion 
of larvae by certain nematodes is larval death. Larval 
death results from the presence of, in certain nematodes, 
bacteria that produce an insecticidal toxin which arrests 
larval growth and inhibits feeding activity. 

25 Interestingly, it appears that each genus of insect 

parasitic nematode hosts a particular species of 
bacterium, uniquely adapted for symbiotic growth with 
that nematode. In the interim since this research was 
initiated, the name of the bacterial genus Xenorhabdus 

30 was reclassified into the Xenorhabdus and the 

Photorhabdus . Bacteria of the genus Photorhabdus are 
characterized as being symbionts of Heterorhajbdi tus 
nematodes while Xenorhabdus species are symbionts of the 
Steinerneina species. This change in nomenclature is 

35 reflected in this specification, but in no way should a 

-9- 
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change in nomenclature alter the scope of the inventions 
described herein. 

The peptides and genes that are disclosed herein are 
named according to the guidelines recently published in 
5 the Journal of Bacteriology "'Instructions to Authors" p. 
i-xii Jan. 1996), which is incorporated herein by 
reference. 

Transformation methods useful in carrying out the 
invention are well known, and are described, for example, 
10 in WO98/08932. 

Hemicot tcdA and tcbA 

SEQ ID NO: 3 is the nucleotide sequence for an 
engineered tcdA gene in accordance with the invention. 
SEQ ID NO: 4 is the nucleotide sequence for an engineered 
15 tcjbA gene in accordance with the invention. 

The following Tables 1 and 2 identify significant 
features of the engineered tcdA and tcbA genes. 



Table 1 
tcdA 



20 



Feature 


nucleotides of SEQ ID NO : 3 


Ncol 


1-6 


Hindi I I 


48-53 


Kpnl 


246-254 


sequence encoding 
TcbAii 


267-5798 


Nhel 


333-338 


Bglll 


1215-1220 


Clal 


2604-2609 


Pstl 


4015-4020 


Agel 


5088-5093 


Muni 


5598-5603 


Xbal 


5778-5783 


sequence encoding 
TcbAii i 


5799-7517 


A fill 


5853-5858 


Sphl 


6439-6444 


Sful 


7392-7397 


Sad 


7519-7524 


Xhol 


7522-7527 


StuI 


7528-7533 


Notl 


7533-7538 


Table 2 
tCjbA 


Feature 


nucleotides of SEQ ID NO: 5 


Ncol 


1-6 


Hi/3 dl 1 1 


48-53 
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Kpnl 


246-251 


sequence encoding 
TcbAii 


267-5798 


Nhel 


333-338 


Bglll 


1215-1220 


Clal 


2604-2609 


Pstl 


4015-4020 


Agel 


5088-5093 


Muni 


5598-5603 


Xbal 


5778-5783 


sequence 
encodingTcbAiii 


5799-7517 


Ai^ill 


5853-5858 


Sphl 


6439-6444 


Sful 


7392-7397 


Sad 


7519-7524 


Sful 


7392-7397 


Sad 


7519-7524 


Xhol 


7522-7527 


Stul 


7528-7533 


Notl 


7535-7540 



It should be noted that the proteins encoded by the 
plant-optimized tcdA (SEQ ID NO: 3) and tcbA (SEQ ID 
NO; 5) differ from the native proteins by the addition of 
an Ala residue at position #2. This modification was 
made to accommodate the Ncol site which spans the ATG 
start codon. 

The following Table 3 compares the codon composition of 
the engineered tcdA gene of SEQ ID NO: 3 and engineered 
tcbA gene of SEQ ID NO: 5 with the codon compositions of 
the native genes, the typical dicot genes, and maize 
genes . 



Table 3 



amino 


codon 


% in 


% in 


% in 


% in 


% in 


% in 


acid 




SEQ 


tcdA 


SEQ 


tcjbA 


dicot 


maize 






ID 




ID 












NO: 3 




NO: 5 








Ala 


GCT 


62 


21 


69 


41 


42 


24 




GCC 


26 


32 


27 


17 


27 


34 




GCA 


11 


25 


4 


22 


25 


18 




GCG 


0 


21 


0 


21 


6 


24 


Arg 


AGG 


48 


0 


60 


2 


25 


26 


CGC 


22 


36 


18 


16 


11 


24 




AGA 


20 


11 


15 


6 


30 


15 




CGT 


11 


39 


7 


57 . 


21 


11 




CGG 


0 


7 


0 


13 


4 


15 




CGA 


0 


8 


0 


6 


8 


9 


Asn 


AAC 


100 


32 


100 


33 . 


55 


68 




AAT 


0 


68 


0 


67 


45 


32 


Asp 


GAC 


67 


22 


70 


25 


42 


63 
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amino 




% in 


% in 


% in 


% in 


% in 


% in 






SEQ 


tied A 


SEQ 


tcJbA 


dicot 


maize 






ID 




ID 












NO : 3 




NO : 5 










GAT 


33 


78 


30 


75 


58 


37 




TGC 


100 


30 


100 


19 


56 


68 


TGT 


0 


70 


0 


81 


44 


32 




i oM 


1 on 


0 


ioo 


0 


33 


59 




TAG 


0 


0 


0 


0 


19 


21 




TAA 


0 


100 


0 


100 


48 


20 






D J 




7 4 


53 


59 


38 












47 


41 


62 






X u u 






36 


51 


71 






n 
u 






64 


49 


29 


oX y 




o / 




64 


44 


33 


20 




■J 




36 


22 


16 


42 






1 
X 


90 


n 
\j 


19 


38 


19 




GGG 


0 


8 


0 


16 


12 


20 


MX S 






4 0 


72 


31 


4 6 


62 








fin 


28 


69 


54 


38 


lie 


TV fp/^ 


/ J 




03 


94 


37 


58 




ATT 


O "7 


D 1 




Zj ^ 




29 




ATA 

ATA 


U 


X D 


n 
u 


1 7 
X / 


18 


14 


Leu 


CTC 


54 


i 1 




/ 


9fl 
^ o 


9 




lib 


O Q 


1 7 
X / 






26 


15 




CI 1 


i D 


Q 


J. ^ 


7 


19 


17 




TTA 




1 8 


0 


19 


10 


5 






0 


32 


0 


29 


9 


29 




CTA 


0 


13 


0 


7 


8 


• 8 


i-»y o 


AAG 


99 


79 


99 


75 
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31 


30 
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18 
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24 
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19 
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0 


76 


0 


81 


43 


27 
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69 


27 


73 


11 


20 


31 




GTG 


21 


17 


22 


27 


29 


39 
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10 


34 


3 


48 


39 


21 




GTA 


0 


22 


2 


14 


12 
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EXAMPLE 1 

Design Of Plant Codon-Biased Genes Encoding W-14 Peptides 
5 TcbA and TcdA 

A. Gene Design 



-12- 



BNSDOCi D: <WO 0 1 T \ 029A 1 J_> 



wo 01/11029 



PCT/USOO/22237 



The coding strands of the native DNA sequences of the 
Photorhabdus W-14 genes encoding peptides TcbA and TcdA 
were scanned for the presence of deleterious sequences 
such as the Shaw/Kamen RNA destabilizing motif ATTTA, 
5 intron splice recognition sites, and poly A addition 
motifs. This was done using the MacVector Sequence 
Analysis Software (Oxford Molecular Biology Group, 
Symantec Corp.), using a custom Nucleic Acid Subsequence 
File. The native sequence was also searched for runs of 

10 4 or more of the same base- 

Motif searching of the native W-14 tcbA and tcdA 
genes revealed the presence of many potentially 
deleterious sequences in the protein coding strands, as 
summarized in Table 4. Not shown, but also present, were 

15 many runs of four or more single residues (e.£. the 
native tchA gene has 81 runs of four A*s). 



Table 4 



Native 
Gene 


ATTTA 


5* Splice 


3' Splice 


Poly A 
Addition* 


RNA? II term. 


tcbA 


18 


7 


17 


46 


0 


tcdA 


18 


7 


13 


77 


1 



* Totals of 16 different motifs. 



Analyses of eukaryotic genes and plant genes in 
20 particular have shown that CG & TA doublets are 

underrepresented, while the genes are enriched in CT & TG 
doublets. The sequences of the hemicot biased genes have 
accordingly been adjusted to encompass these base 
compositions and to have G+C compositions of about 53%, 
25 similar to many plant genes. When compared to the native 
W-14 tcjbA and tcdA genes, the plant-biased genes have a 
much more uniform G+C distribution. 

Nucleotide changes to remove potentially deleterious 
sequences were chosen to simultaneously adjust the codon 
30 composition of the coding region to more closely reflect 
that of plant genes. A framework for these changes was 
provided by the codon bias tables prepared for maize and 
dicot genes shown in Table 3. 
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Comparison of codon compositions of the native W-14 
genes to maize and dicot genes revealed that the W-14 
genes contain a very different preference set of the 
degenerate codons for the 18 amino acids for which there 
5 is a choice (Table 3) . For each of 8 amino acids (Phe, 
Tyr, Cys, Arg, Asn, Lys, Glu, and Gly) in both W-14 
genes, the most abundant codon is different from the 
preferred codons found in either maize or dicot genes. 
One might expect that translational difficulties would be 

10 encountered in efforts to produce in plants proteins 

(such as TcbA and TcdA) having high relative amounts of 
these amino acids from mRNAs having large numbers of 
nonpref erred codons. There is a marked difference in 
distribution of the codon compositions specifying the 

15 other 10 amino acids. For His, Gin, lie, Val, and Asp, 

the dicot-pref erred codons are found as the most abundant 
ones in both W-14 genes. For Leu, Thr, Ser, and Ala, the 
maize preferred codons are the most abundant codon 
choices found in the tcdA gene. In contrast, the tcbA 

20 gene contains only the CCG (Pro) maize-preferred codon as 
the highest abundance choice. 

In making the codon choices, doublet contents were 
considered, so that adjacent codons preferably did not 
form CG or TA doublets (which are underrepresented in 

25 eukaryotic genes; 1, 4), while CT or TG doublets (which 

are enriched in eukaryotic genes ibid. ) were created -when 
possible . 

Choices were also made to utilize a diversity of 
codons for Met, Trp, Asn, Asp, Cys, Glu, His, lie, Lys, 
30 Phe, Thr, and Tyr. 

The sequences were also designed to encode unique 6- 
bp recognition sites for restriction enzymes, spaced 
about every 1200 bp. Finally, an additional codon (GCT; 
Ala) was inserted at the second position to encode an Nco 
35 I recognition site encompassing the ATG (Met) start 

codon. Additional recognition sites were included after 
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the Stop codon to facilitate subsequent cloning steps 
into expression vectors. These features are set forth 
above in Tables 1 and 2. 

The new tcdA and tcjbA genes of SEQ ID NO: 3 and SEQ 
5 ID NO: 4 share 73. 5%, and 72.6%% identity, respectively, to 
their native W-14 counterparts (Wisconsin Genetics 
Computer Group, GAP algorithm) . 
B. Gene Synthesis 

The complete synthesis of the plant codon-biased 

10 tcjbA and tccLA genes was performed under contract by 
Operon Technologies, inc. (OPTI, Alameda, CA) . 
Basically, chemically synthesized oligonucleotides of 
appropriate sequence were assembled into DNA pieces about 
500 bases long. These were joined together end-to-end 

15 (presumably by means of appropriately placed restriction 
enzyme sites) into four larger pieces of roughly 2 
kilobase pairs (kbp) each; therefore each comprised about 
1/4 of the entire coding region of the particular gene. 
DNA sequence of the pieces was confirmed at this step. 

20 If mistakes in sequence were present, the appropriate 
oligonucleotides were re-synthesized, and the assembly 
process was repeated. Once gene fractional parts were 
sequence verified, they were assembled in pairs to make 
the gene halves, and again sequence verified. Finally, 

25 the two halves were joined, and the sequences of the 

junctions between the halves was" verified. Therefore, 
each part of the new gene was sequence verified at least 
twice . 

It should be noted that attempts to express the 
30 native tcbA or tcdA genes in standard Escherichia coli 
cloning strains suggests that production of these 
proteins is lethal. Lethality problems may be 
encountered if standard cloning vectors having leaky 
expression from inherent lacZ promoters are used to 
35 assemble these genes. 
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C. Addition Of Endoplasmic Reticulum Targeting Peptide To 
Tcda Coding Region 

It is known to those in the field of plant gene 
expression that proteins are specifically directed into 
5 the endoplasmic reticulum (ER) by means of a short signal 
peptide which is removed during or after the transport 
process through the ER membrane. The mature (processed) 
protein is incorporated into the ER endomembrane or is 
released into the ER lumen where the transported protein 
10 may be uniquely folded (aided by chaperonins) , modified 
by glycosylation, accumulated in the vacuole, or 
additionally translocated (by secretion) . These 
processes are reviewed by Gomord and Faye [V. Gomord and 
L. Faye, (1996) Signals and mechanisms involved in 
15 intracellular transport of secreted proteins in plants. 
Plant Physiol. Biochem, 34:165-181] and by Bar-Peled et 
ai. [M. Bar-Peled, D. C. Bassham, and N. V. Raikhel, 
(1996) Transport of proteins in eukaryotic cells: more 
questions ahead. Plant Molec. Biology 32:223-249]. It is 
20 also known that the subcellular recognition mechanisms 
for an ER signal peptide are evolutionarily somewhat 
conserved, since the ER signal for a protein normally 
produced in monocot (maize) cells is recognized and 
processed normally by dicot (tobacco) cells. This is 
25 exemplified by the maize 15 kDa zein ER signal peptide 

[L. M. Hoffman," D. D. Donaldson, R. Bookland, K. Rashka, 
and E. M. Herman, (1987) Synthesis and protein body 
deposition of maize IS-kd zein in transgenic tobacco 
seeds. EMBO J. 6:3213-3221, and U.S. Patent 5589616]. 
30 Further, it is known that the ER signal peptide derived 
from one protein can direct the translocation of a 
different protein if it is appropriately attached to the 
second protein by genetic engineering methods [D. C. Hunt 
and M. J. Chrispeels, (1991) The signal peptide of a 
35 vacuolar protein is necessary and sufficient for the 
efficient secretion of a cytosolic protein. Plant 
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Physiol. 96:18-25, and Denecke, J., J. Botterman, and R. 
Deblaere (1990) Protein secretion in plants can occur via 
a default pathway. Plant Cell 2:51-59]. Therefore, one 
may expose a protein in vivo to different biochemical 
5 environments by directing its accumulation in the cytosol 
(by not providing a signal peptide sequence) , or in the 
ER/vacuole (by provision of an appropriate signal 
peptide . ) 

The ER signal peptide of maize 15 kDa zein proteins 

10 is known to comprise the first 20 amino acids encoded by 
the zein coding region. Two examples of such signal 
peptides the ER signal peptide of 15 kDa zein from A5707 
maize, NCBI Accession # M72708, and the ER signal peptide 
of 15 kDa zein from Black Mexican Sweet maize, NCBI 

15 Accession # M13507, There is only a single amino acid 
difference (Ser vs Cys at residue 17) between these 
signal peptides. 

SEQ ID NO: 5 is a modified sequence coding the ER 
signal peptide of 15 kDa zein from Black Mexican Sweet 

20 maize. The modifications embodied in this sequence were 
made to accommodate the different monocot/dicot codon . 
usages and other sequence motif considerations discussed 
above in the design of the plant-optimized tcdA coding 
region. The sequence includes an additional Ala residue 

25 at position #2 to accommodate the Ncol site which spans 
the ATG start codon . 

SEQ ID NO: 6 gives a sequence coding for the full- 
length native TcdA protein (amino acids 22-2537) fused to 
the modified 15 kDa zein endoplasmic reticulum signal 

30 peptide (amino acids 1-21) . 

Example 2 

Transformation Of Tobacco With Agrobacterium Carrying 
Plasmid pDAB2041 Encoding Photorhabdus Toxins 
A. Plasmid pDAB2041 

35 Preparation of tobacco transformation vectors was 
accomplished in three steps. First, a modified plant- 
optimized tcdA coding region was ligated into a tobacco 

-17- 



0111029A1 I > 



wo 01/11029 



PCT/USOO/22237 



plant expression cassette plasmid. In this step, the 
coding region was placed under the transcriptional 
control of a promoter functional in tobacco plant cells . 
RNA transcription termination and polyadenylation were 
5 mediated by a downstream copy of the terminator region 
from the Agrojbacteriu/n nopaline synthase gene. Two 
plasmids designed to function in this role are pDAB1507 
and pDAB2006. In the second step, the complete gene 
comprised of the promoter, coding region, and terminator 
10 region was ligated between the T-DNA borders of an 
Agrobacterium binary vector, pDAB1542. Also positioned 
between the T-DNA borders was a plant selectable marker 
gene to allow selection of transformed tobacco plant 
cells. in the third step, the engineered binary vector 
15 plasmid was conjugated from its £. col± host strain into 
a disabled Agrojbacteriuin tumefaciens strain capable of 
transforming tobacco plant cells that regenerate into 
fertile transgenic plants. 

It is a feature of plasmid pDAB1507 that any coding 
20 region having an Ncol site at its 5' end and a Sad site 
3' to the coding region, when cloned into the unique Ncol 
and Sad sites of pDAB1507, is placed under the 
transcriptional control of an enhanced version of the 
CaMV 35S promoter. It is also a feature of pDAB1507 that 
25 the 5' untranslated leader (UTR) sequence preceding the 
Ncol site comprises a modified version of the 5' UTR of 
the MSV coat protein gene, into which has been cloned an 
internally deleted version of the maize AdhlS intron 1. 
Additionally it is a feature of pDAB1507 that 
30 transcription termination and polyadenylation of the mRNA 
containing the introduced coding region are mediated by 
termination/Poly A addition sequences derived from the 
nopaline synthase (Nos) gene. Finally, it is a feature 
of pDAB1507 that the entire assembly of promoter/coding 
35 region/3' UTR can be obtained as a single DNA fragment by 
cleavage at the flanking Notl sites. 

-18- 



BNSDCXID: <WO 01 1 1029A1J_> 



wo 01/11029 



PCT/USOO/22237 



It is a feature of plasmid pDAB2006 that any coding 
region having an Ncol site at its 5' end and a Sad site 
3' to the coding region, when cloned into the unique Wcol 
and Sad sites of pDAB2006, is placed under the 
5 transcriptional control of the CaMV 35S promoter. It is 
also a feature of pDAB2006 that the 5' untranslated 
leader (UTR) sequence preceding the Wcol site comprises a 
polylinker. Additionally it is a feature of pDAB2006 
that transcription termination and polyadenylation of the 

10 mRNA containing the introduced coding region are mediated 
by termination/Poly A addition sequences derived from the 
nopaline synthase (Nos) gene. Finally, it is a feature 
of pDAB2006 that the entire assembly of promoter/coding 
region/3' UTR can be obtained as a single DNA fragment by 

15 cleavage at the flanking Notl sites. 

It is a feature of pDAB1542 that any DNA fragment 
flanked by Notl sites can be cloned into the unique Notl 
site of pDAB1542, thus placing the introduced fragment 

20 between the T-DNA borders, and adjacent to the neomycin 
phosphotransferase II (kanamycin resistance) gene. 

To prepare a plant-expressible gene to produce the 
non-targeted TcdA protein in tobacco plant cells, DNA of 
a plasmid (pA0H_4-OPTI) containing the plant-optimized 

25 tcdA coding region, (SEQ ID No: 3) was cleaved with 

restriction enzymes Ncol and Sad, and the large 7550 bp 
fragment was ligated to similarly-cut DNA of plasmid 
pDAB1507 to produce plasmid pDAB2040. DNA of pDAB2040 
was then digested with Notl, and the 8884 bp fragment was 

30 ligated to Notl digested DNA of pDAB1542 to produce 

plasmid pDAB2041. This plasmid was then conjugated by 
triparental mating [ Firoozabady , E., D. L. DeBoer, D. J. 
Merlo, E. L. Halk, L. N. Amerson, K. E. Rashka, and E. E. 
Murray (1987) rransjforma tion of cotton (Gossypium 

35 hirsutum L.) by Agrobacterium tumefaciens and 

regeneration of transgenic plants. Plant Molec. Biol. 
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10:105-116] from the host Escherichia coli strain (XLl- 
Blue, Stratagene, La Jolla, CA) , into the nontumorigenic 
Agrobacterium tumefaciens strain EHAIOIS, which is a 
spontaneous streptomycin-resistant mutant of strain 
5 EHAlOl (Hood, E. E., G. L. Helmer, R- T. Fraley, and M.- 
D. Chilton (198 6) The hypervirulence of Agrobacterium 
tumefaciens A281 is encoded in a region of pTiBo542 
outside of T-DNA. J. Bacterid. 168:1291-1301). Strain 
EHAIOIS {pDAB2041) was then used to produce transgenic 

10 tobacco plants that expressed the TcdA protein. 
B. . Plasmid pRK2013 

To prepare a plant-expressible gene to produce the 
endoplasmic reticulum-targeted TcdA protein in tobacco 
plant cells, DNA of a plasmid (pA0H_4-ER) containing the 

15 plant-optimized, ER-targeted tcdA coding region, (SEQ- ID 
No: 6) was cleaved with restriction enzymes Ncol and Sad, 
and the large 7 610 bp fragment was ligated to similarly- 
cut DNA of plasmid pDAB2006 to produce plasmid pDAB1833. 
DNA of pDAB1833 was then digested with WotI, and the 8822 

20 bp fragment was ligated to NotI digested DNA of pDAB1542 
to produce plasmid pDAB2052. This plasmid was then 
conjugated by triparental mating from the host 
Escherichia coli strain (XLl-Blue) , into the 
nontumorigenic Agrobacterium tumefaciens strain EHAIOIS. 

25 Strain EHAIOIS (pDAB2052 ) was then used to produce 

transgenic tobacco plants that expressed the TcdA protein 
containing an amino terminus endoplasmic reticulum 
targeting peptide. 

30 C. Transfer of Plasmid pDAB2041 Into Agrojbacteriuin Strain 
EHAIOIS 

Cultures of jEJ. coli carrying the engineered Ti 

plasmid pDAB2041 (plasmid containing the rebuilt Toxin A 

gene, tcdA), E. coli carrying the plasmid pRK2013, and 

35 Agrobacterium strain EHAIOIS were grown overnight, then 

mixed 1:1:1 on plain LB medium solidified with agar and 
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cultured in zhe dark at 28°C- Two days later, the lawn of 
bacteria was scraped up with a loop, suspended in plain 
LB medium, vortexed, and then diluted 1:10^ , 1:10^, and 
1:10^ fold in plain LB liquid medium, Aliquots of these 
dilutions were spread on selective plates containing 
medium YEP plus erythromycin (100 mg/L) and streptomycin 
(250 mg/L) and grown at 28*^0. Two days later, single 
colonies were picked and streaked onto the same medium, 
then spread to give single colonies. Single colonies were 
picked again and streaked, then spread for single 
colonies. Single colonies were picked a third time, grown 
as streaks, then subjected to a quality analysis 
involving growth on lactose medium and chromogenic assay 
with Benedict's reagent. Of ten strains developed in this 
way, the fastest coloring colony was chosen for further 
work. 

D. Transformation Of Tobacco With Agrojbacteriuni Carrying 
Plasmid pDAB2041 
20 Tobacco transformation with Agrojbacterium 

tumefaciens was carried out by a method similar, but not 
identical, to published methods (R Horsch et al, 1988. 
Plant Molecular Biology Manual, S. Gelvin et al, eds . , 
Kluwer Academic Publishers, Boston) . To provide source 
25 tissue for the transformation, tobacco seed (Nicotiana 
tabacum cv. Kentucky 160) were surface sterilized and 
planted on the surface of TOB- , which is a hormone-free 
Murashige and Skoog medium (T. Murashige and F.Skoog, 
1962) . A revised medium for rapid growth and bioassays 
30 with tobacco tissue culture. Plant Physiol. 75: 473-497) 
solidified with agar. Plants were grown for 6-8 weeks in 
a lighted incubator room at 28-'30*'C and leaves were 
collected sterilely for use in the transformation 
protocol. Approximately one cm^ pieces were sterilely cut 
35 from these leaves, excluding the midrib. Cultures of the 
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Agrobacterium strains (EHAIOIS containing pDAB2041), 
which had been grown overnight on a rotor at 28*'C^ were 
pelleted in a centrifuge and resuspended in sterile 
Murashige & Skoog salts, adjusted to a final optical 
5 density of 0.7 at 600 nm. Leaf pieces were dipped in 
this bacterial suspension for approximately 30 seconds, 
then blotted dry on sterile paper towels and placed right 
side up on medium TOB+ (Murashige and Skoog medium 
containing 1 mg/L indole acetic acid and 2.5 mg/L 

10 benzyladenine) and incubated in the dark at 28''C. Two 
days later the leaf pieces were moved to medium TOB+ 
containing 250 mg/L cefotaxime (Agri-Bio, North Miami, 
Florida) and 100 mg/L kanamycin sulfate (AgriBio) and 
incubated at 28-30*'C in the light. Leaf pieces were moved 

15 to fresh TOB+ with cefotaxime and kanamycin twice per 
week for the first two weeks and once per week 
thereafter. Leaf pieces which showed regrowth of the 
Agrobacterium strain were moved to medium TOB+ with 
cefotaxime and kanamycin, plus 100 mg/1 carbenicillin 

20 (Sigma) . Four to six weeks after the leaf pieces were 
treated with the bacteria, small plants arising from 
transformed foci were removed from this tissue 
preparation and planted into medium TOB- containing 250 
mg/L cefotaxime and 100 mg/L kanamycin in Magenta GA7 

25 boxes (Magenta Corp., Chicago). These plantlets were 
grown in a lighted incubator room. After 3-4 weeks the 
primary transgenic plants had rooted and grown to a size 
sufficient that leaf samples could be analyzed for 
expression of protein from the transgene. Twenty-five 

30 independent transgenic events were recovered as single 
plants from the pDAB2041 transformation. 

Eight independent lines expressing various levels of 
transgenic protein from the T-DNA of pDAB2041 were 
propagated in vitro from leaf pieces as follows. Twelve 

35 to sixteen approximately one cm^ pieces were sterilely cut 

from leaves of each primary transgenic plant, excluding 
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the midrib and all naturally occurring edges. These leaf 
pieces were placed on medium TOB+ containing 250 mg/L 
cefotaxime and 100 mg/L kanamycin, and cultured in the 
lighted incubator at 28-30*^0 for 3-4 weeks, at which time 
5 small plants could be cut from the proliferating tissue 
mass. Several small plantlets from each transgenic line 
were moved into Magenta boxes containing medium TOB- plus 
cefotaxime and kanamycin and allowed to root and grow. 
The proliferating tissue mass was further cultured on 

10 medium TOB+ with cefotaxime and kanamycin, and additional 
plants could be cut out and grown up as needed. 

Plants were moved into the greenhouse by washing the 
agar from the roots, transplanting into soil in 5 
square pots, placing the pot into a Ziploc bag 

15 (DowBrands) , placing plain water into the bottom of the 
bag, and placing in indirect light in a 30°C greenhouse 
for one week. After one week the bag could be opened; the 
plants were fertilized and allowed to grow further, until 
the plants were acclimated and the bag was removed. 

20 Plants were grown under ordinary warm greenhouse 

conditions (30°C, 16 H light) . Plants were suitable for 
sampling four weeks post transplant. 



Example 3 

25 Chacterization Of Transgenic Tobacco Plants Expressing 
Photorhabdus, Toxin That Confer Insect Control. 

A. Polyclonal Antibody Production 

The E. coli produced recombinant TcdA protein was 

30 purified by a series of column purification. The protein 
was sent to Berkley Antibody Company (Richmond, CA) for 
the production of antiserum in a rabbit. Inoculations 
with the antigen were initiated . with 0.5 mg of protein 
followed by four boosting injections of 0.25 mg each at 

35 about three week intervals. The rabbit serum was tested 

by the standard Western analysis using the recombinant 

TcdA protein as the antigen and enhanced chemi- 
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luminescens, ECL method (Amersham, Arlington Heights, IL 
) .The antibodies (PAb-EAo) were purified using a PURE I 
antibody purification kit (Sigma, St. Luis, MO). PAb-EAo 
antibodies recognize the full-length TcdA and its 
5 processed components. 

B. Expression Of TcdA Protein In Tobacco 

Protein was extracted from the leaf tissue of 
transformed and non-transf ormed tobacco plants following 
the procedure described immediately below. 

10 Two leaf disks of 1.4 cm in diameter were harvested 

from the middle portion of a fully expanded leaf. The 
disks were placed on a 1 . 6 x 4 cm piece of SM Whatman 
paper. The paper was folded lengthwise and inserted in a 
flexible straw. Four hundred micro liters of the 

15 extraction buffer (9.5 ml of 0.2 M NaH2P04/ 15.5 ml of 0.2 
M Na2HP04, 2 ml of 0,5 M Na2EDTA, 100 ml. of Triton XlOO, 1 
ml of 10% Sarkosyl, 78 ml of beta-mercaptoethanol , H2O to 
bring total volume to 100 ml) was pipetted on to the 
paper. The straw containing the sample was then passed 

20 through a rolling device used for squeezing out the 

extract 1.5 mL micro centrifuge tube was placed at the 
other end of the straw to collect the extract. The 
extract was centrifuged for 10 minutes at 14,000 rpm in 
an Eppendorf regrigerated microcentrifuge. The 

25^. supernatant was transferred into a new tube. Protein 
quantitation analysis was performed using the standard 
Bio-Rad Protein Analysis protocol (Bio-Rad Laboratories, 
Hercules, CA) . The extract was diluted to 2 mg/ml of 
total protein using the extraction buffer. 

30 For the detection of transgenic protein. Western 

blot analysis was performed. Following a standard 
procedure for protein separation (Laemmli, 1970), 40 fxg 
of protein was loaded in each well of 4-20% gradient 
polyacrylamide gel (Owl Scientific Co., MA) for 

35 electrophoresis. Subsequently, the protein was 
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transferred onto a nitrocellulose membrane using a semi- 
dry electroblotter (Pharmacia LKB Biotechnology, 
Piscataway, NJ) , The membrane was incubated for one hour 
in Blotto (5% milk in TBST solution; 25 mM Tris HCL pH 
5 7.4, 136 mM NaCl, 2.7 mM KCl, 0.1% Tween 20). Thereafter 
, Blotto was replaced by the primary antibody solution 
(in Blotto) . After one hour in the primary antibody, the 
membrane was washed with TBST for five minutes three 
times. Then the secondary antibody in Blotto (1:2000 

10 dilution of goat anti-rabbit IgG conjugated to 

horseradish peroxidase; Bio-Rad Laboratories) . was added 
to the membrane. After one hour of incubation, the 
membrane was washed with an excess amount of TBST for 10 
minutes four times. The protein was visualized by using 

15 the enhanced chemi-luminescens , ECL method (Amersham, 

Arlington Heights, XL ) . The differential intensity of 
the protein bands were measured using densitometer 
(Molecular Dynamics Inc., Sunnyvale, CA) . 

To determine the expression of TcdA protein in 

20 tobacco transformed with pDAB2041, PAb-EAo antibodies were 
used as the primary antibodies. The expression levels of 
TcdA protein varied among independent transformation 
events. The primary plant generated from the event 
#2041-13 showed the highest level of pre-pro TcdA 

25 expression of extractable protein. When the leaf pieces 
from this plant (#2041-13) were used in in vitro 
propagation, several plants were obtained. Seven of 
these plants were analyzed for the expression of the TcdA 
protein. All but one plant produced the full-length TcdA 

30 protein as well as some processed peptide components. 
Using the antibodies specific to Neomycin phospho- 
transferase, NPT (5 prime-3 prime. Boulder, Co), the 
expression the selectable marker gene {npt II) was 
detected. Similar results were obtained for #2041-29. 

35 

Table 5 
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rived from event #2041-13 



Plant # 


TcdA 


MPT (selectable marker) 


2041 -13 A 




not done 


2041<13B 




not done 


2041-13-1 






2041-13-2 


+ 


+ 


2041-13-3 


+ 


+ 


2041-13-4 


+ 




2041-13-5 




+ 



C, Nucleic Acid Analysis of Transgenic Tobacco Lines 
Genomic DNA was prepared from a group of 2041 
5 transgenic events. The lines included Magenta box stage 
2041-13, and greenhouse stage plants 2041-13-1, 2041-13- 
2, 2041-13-5, 2041-9, 2041-20A and 2041-20B. A 
transgenic GUS line (2023) was included as a negative 
control. Southern analysis of these lines was performed. 

10 The genomic tobacco DNA was restricted with the enzyme 
SstI which should result in a 8.9 kb hybridization 
product when hybridized to a tcdA gene specific probe. 
The 8-9 kb hybridization product should consist of the 
35T promoter and the tcdA coding region. All 2041 plants 

15 contained a band of the expected size. Events 2041-9 and 
-20 appear to be the same line with 5 identical 
hybridizing bands. Event 2041-13 produced 6 
hybridization fragments with the tcdA coding region 
probe. Magenta box and various greenhouse plants of 

20 2041-13 all produced the same hybridization profile. 
This hybridization pattern was different from that of 
events 2041-9 and -20. 

RNA analysis, using the tcdA coding region probe, 
was performed on the same group of greenhouse 2041 

25 plants. Immunoblot analysis had revealed that plants 
2041-9, 2041-20A, 2041-20B, and 2041-13-1 produced no 
detectable TcdA protein; while 2041-13-2 and 2041-13-5 
produced substantial amounts of full-length TcdA. 
Northern analysis was in agreement with the immunoblot 
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result. A faint RNA signal was detected for plants 2041- 
9, 2041-20A, 2041-20B, and 2041-13-1. Only faintly 
visible was a band corresponding to full-length tcdA 
transcript in plant 2041-13.1. In contrast, for plants 
5 2041-13-2 and 2041-13-5 a strong RNA signal was detected, 
with a substantial amount of full-length size (-8.0 kb) 
tcdA transcript. These data support the observed 
bioassay activity for this group of plants. 

Genomic DNA was prepared from a second functionally 

10 active 2041 transgenic event, 2041-29. Southern analysis 
of this line was performed- A transgenic GUS line (2023) 
was included as a negative control, DNA of line 2041-9 
was included as a positive control. 

The genomic tobacco DNAs were restricted with the 

15 enzyme SstI which should result in a 8 . 9 kb hybridization 
product when hybridized to a tcdA gene specific probe. 
The 8.9 kb hybridization product, should consist of the 
35T promoter and the tcdA coding region. For plant 2041- 
29-5, three hybridization products larger than 8.9 kb the 

20 were detected with the tcdA gene specific probe. 

Immunoblot analysis has demonstrated pre-pro TcdA protein 
is made by this plant, it is therefore likely that a 
restriction site was lost during transformation or 
regeneration, or the 2041-29 genomic DNA was not 

25 thoroughly digested. 

D. Tobacco Leaf -Disk Tests With Tobacco Hornworm 

Exhibiting Insect Control 

Leaves were sampled from tobacco plants, Nicotiana 
30 tabaco, previously transplanted into the greenhouse. A 

single leaf was sampled from each plant on each test 

date. Leaves were selected from the zone where younger 

elongate leaves transition into older ovate leaves. 

Excised leaves were placed into 12 oz. cups with the 
35 petiole submerged in water to maintain turgor, and 

transported to the laboratory, 
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Eight, 1.4 cm disks were cut from the center portion 
of one side of each leaf (right adaxial side up, with 
distal portion facing away from the observer) . Each disk 
was placed individually into a well of a C-D 
5 International 128 well tray (Pitman, NJ.) into which 0.5 
ml of a 1-6% aqueous agar solution had been previously 
pipetted. The solidified agar prevented the leaf disks 
from drying out. The adaxial surface of the disk was 
always oriented up. 

10 A single neonate tobacco hornworm, Manduca sexta, 

was placed on each disk and the wells were sealed with 
vented plastic lids. The assay was held at 27°C and 40% 
RH. Larval mortality and live-weight data were collected 
after 3 days. Data were subjected to analysis of 

15 variance and Duncan's multiple range test (a = 0.05) (Proc 
GLM, SAS Institute Inc., Gary, NC). Data were 
transformed using a logarithmic function to correct a 
correlation between the magnitude of the mean and 
variance. 

20 Table 6 

Results of leaf-disk assays from greenhouse grown tobacco 





Weight of Surviving 


, Larvae (mg) & Duncan's Group ^ 


TRT 


Plant 


Plant 


Pretes 


Test 1 


Test 2 


Test 3 


3 Test 






Age 


t 








Sum. 


13 


non-transfonned - 2 


young 








18.8 a* 




14 


non-transformed - 3 


young 








17.0 ab 




16 


non-transformed - 5 


young 








16.4 ab 




3 


2041-13-1 (western -) 


young 




17.6 a 


18.2 a 


16.rab 


17.3 a 


9 


Gus Control 


old 


19.3 a 


14.6 a 


16.3 a 


14.5 ab 


15.1 a 


10 


non-transformed - 1 


young 




8.3 b 


16.8 a 


13.9 b 


13.0 b 


11 


204 1-20B (western -) 


old 




10.0 b* 


13,7 ab 


14.6 ab 


12.9 b 


15 


non-transformed - 4 


young 








13.0 be 




8 


204 1-20A (western -) 


old 


15.7 a 


8.3 b 


11.3 be 


9.2 cd 


9.6 c 


12 


2041-9 (western -) 


old 


19.5 a 






7.9 d 




7 


204 1-13-5 (western +) 


young 




6.3 be 


9.6 cd 


7.2 de 


7.7 d 


5 


2041-13-3 (western +) 


young 




6'.4 


6.2 e 


6.8 de** 


6.4 de 










be**** 








1 


2041-13A (western +) 


old 


7.2 b 


6.8 be* 


7.0 de* 


5.4 e 


6.4 de 


6 


2041-13-4 (western +) 


young 




4.9 c**** 


5.8 e 


7.6 d 


6.4 de 


4 


2041-13-2 (western +) 


young 




5.7 be 


5,7 e** 


7.5 d 


6.3 de 


2 


2041- 13B (western +) 


old 




4.7 c** 


5.6 e 


7.2 de 


5.9 e 



* Number of stars corresponds to the 
larvae per 8 tested. 
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1. Data transformed (logarithm) for analysis. 

Means followed by the same letter are not significantly 

different (alpha = 0.05). 



5 TABLE 7 

Results Of Leaf-Disk Assays From Greenhouse Grown Tobacco 

Plants 

With Event 2041-29, 



MEAN WOT (MG) / Duncan's Group 



Plant 


Test 1 


Test 2 


Test 3 


Test 4 


Four Test 
Sunimary 


2014-6 GUS 1 


15.8 a 


16.6a 


**5.5bc 


*12.9ab 


13.2 a 


2014-6 GUS 2 


14.4 a 


♦6.6 be 


* 13.4a 


15.2a 


12.6 a 


KY-160 NTC 


13.4 a 


6.7 be 


7.9b 


8.5be 


9.1 b 


2041-29 4P 


*4.9b 


♦7.3b 


****6.9b 




6.3 c 


2041-29 7 


*5.9b 


5.1 be 


***6.7b 


***7.2c 


6.1 c 


2041-29 3? 


*5.6b 


**7.9b 


*****6.5b 


♦**3.6d 


5.9 c 


2041-29 2P 


6.3 b 


****4.7e 




******4.6d 


5.4 c 



* Number of stars corresponds to the number of dead 
10 larvae per 8 tested. 

1. Data transformed (logarithm) for analysis. 

Means followed by the same letter are not significantly 

different (alpha = 0.05). 

All event 2041-29 plants significantly depressed THW 
15 larval weight gain compared to control plants. Average 
weight depression was 49%. Statistically significant 
mortality occurred in THW larvae exposed to foliage from 
2041-29 plants. Mortality averaged 37.5% compared to - 
5.2% in controls. 

20 

E- Isolation and Characterization of Functional 
Photorhabdus Toxin Protein From Transgenic Plants 

Seven grams of transgenic tobacco plants (2041-13) 
expressing TcdA (Toxin A) gene were homogenized with 10 

25 ml 50 mM Potassium Phosphate buffer, pH 7 . 0 using a bead 
beater (Biospeo Products, Bartlesville, OK) according to 
manufacturer's instructions. The homogenate was filtered 
through four layers of cheese cloth and then centrifuged 
at 35,000 g for 15 min. The supernant was collected and 

30 filtered through 0.22 jim Millipore Express™ membrane. It 
was then applied to a Superdex 200 cloumn (2.6 x 4 0.v.cm) 
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which had been equilibrated with 20 mM Tris buffer, pH 
8.0 (Buffer A). The protein was eluted in Buffer A at a 
flow rate of 3 ml/min. Fractions with 3 ml each were 
collected and subjected to southern corn rootworm (SCR) 
5 bioassay. It was found that fractions corresponding to a 
native molecular weight around 860 kDa had the highest 
insecticidal activity. Western analysis of the active 
fraction using a polyclonal antibody specific to Toxin A 
indicated the presence of full-length TcdA peptide. The 

10 active fractions were further combined and applied to a 

Mono Q 10/10 column which had been equilibrated with ' ■ 
Buffer A. Proteins bound to the column were then eluted^ ^ : 
by a linear gradient of 0 to 1 M NaCl in Buffer A. 
Fractions with 2 ml each were collected and analyzed by 

15 both SCR bioassay and Western using antibody specific to. . 
Toxin A. The results again demonstrated' the correlation 
between insecticidal activity and presence of full-length 
TcdA peptide • 

20 F. Characterization of Progeny Transgenic Plants 

The inheritability of the genetically engineering 
plants containing the Photorhabdus toxin gene was 
evaluated by generating Fl progeny. Progeny was 
generated from 2041-13 event by selfing expression 

25 positive plants. The 2041-13 plants in the greenhouse 
were allowed ro self -pollinate. Seed capsules were 
collected when mature and were allowed to dry and after- 
ripen on the laboratory bench for two weeks. Seed from 
plant designated 2041-13A was surface-sterilized and 

30 distributed on the surface of medium TOB- without 
selection, to allow recovery of nonexpressing or 
nontransgenic progeny as well as expressing and 
segregating transgenic siblings. Seed was germinated in 
a C lighted incubator room (16 H light, 28 C) . After 1 

35 month, fifty-one seedlings, designated 2041-13A-S1 
through S51, were distributed into Magenta boxes 
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self-fertilized 2041-13 plants genetically engineered to 
produce the ""204" A toxin. The tests included 6 non- 
expressing progeny (protein-negative controls) , 45 toxin 
A expressors/ and 4 non-transformed controls (KY-160) . 
5 Results are from three leaf-disk assays (method 

previously outlined) where eight disks were used per 
test- The data were analyzed using analysis of variance 
and were blocked by test. 

The treatment effect for each of these analyses 

10 indicated the Pr > F was less than 0.0001. The Toxin A 
expressors produced significant control of tobacco 
hornworm compared to each of the control groups based on 
each of the three measures of efficacy. The two control 
groups behaved similarly. Statistical analysis using 

15 ANOVA and an LSD test with alpha equal to 0.01 (or 1%) 
showed differences between the 3 groups. The LSD test 
indicated that the non-expressors and the non-transformed 
plants were similar in larvae weights but the expressors 
gave weights significantly lower than either of the other 

20 two groups of plants. These data demonstrated that the 
genetic basis for insect control was inheritable and 
corresponded to the presence of expressed toxin gene. 

Table 8 

Tobacco hornworm results from Fl progeny of self- 

25 fertilized 



2041-13 tobacco plants. 





Mean Value and Duncan's Grouping*^ 


Treatment Group 


Total Weight (mg)' 


Survivor Weight (mg)** 


Leaf Area (cm^)*" 


Non-transformed Control 


15.8 a 


15.8 a 


1.2 a 


Protein-negative Control 


16.4 a 


16.5 a 


1.2 a 


Toxin A Expressor 


8.1 b 


9.2 b 


4.9 b 



^ Average insect weight with dead insects considered to 
weigh nothing. 

^ Average insect weight with dead insects excluded from 



30 analysis. 

*^ Total leaf area remaining per eight leaf disks. Initial 
area was approximately 12 cm^. 

^ Means followed by the same letter are not significantly 
different (alpha = 0.05). 
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Example 4 

Transformation Of Maize With a Vector Carrying Plasmid 
5 pDAB1834 Encoding Photorhabdus Toxins 

A. Preparation Of Maize Transformation Vectors 
Containing Modified Plant-Optimized Tcda Coding Regions: 
Plasmid Pdabl834 

10 

Preparation of maize transformation vectors was 
accomplished in two steps. First, a modified plant- 
optimized tcdA coding region was ligated into a plant 
expression cassette plasmid. In this step, the coding 

15 region was placed under the transcriptional control of a 
promoter functional in maize plant cells, RNA 
transcription termination and polyadenylation were 
mediated by a downstream copy of the terminator region 
from the Agrobacterium nopaline synthase gene. One 

20 plasmid designed to function in this role is pDAB1538. In 
the second step, the complete gene comprised of the 
promoter, coding region, and 3' UTR terminator region was 
ligated to a plant transformation vector that contained a 
plant expressible selectable marker gene which allowed 

25 the selection of transformed maize plant cells amongst a 
background of nontransf ormed cells. An example of such a 
vector is pDAB367, 

It is a feature of plasmid pDAB1538 that any coding 
region having an Ncol site at its 5' end and a Sad site 

30 3' to the coding region, when cloned into the unique Ncol 
and Sad sites of pDAB1538, is placed under the 
transcriptional control of the maize ubiquitinl (ubil) 
promoter. It is also a feature of pDAB1538 that the 5' 
untranslated leader (UTR) sequence preceding the Ncol 

35 site comprises a polylinker. Additionally it is a 

feature of pDAB1538 that transcription termination and 
polyadenylation of the mRNA containing the introduced 
coding region are mediated by termination/Poly A addition 
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sequences derived from the nopaline synthase (Nos) gene. 
Finally, it is a feature of pDAB1538 that the entire 
assembly of promoter/coding region/3' UTR can be obtained 
as a single DNA fragment by cleavage at the flanking WotI 
5 sites. 

It is a feature of pDAB367 that the phosphinothricin 
acetyl transferase protein, which has as its substrate 
phosphinothricin and related compounds, is produced in 
plant cells through transcription of its coding region 

10 mediated by the Cauliflower Mosaic Virus 35S promoter and 
that termination of transcription plus polyadenylation 
are mediated by the nopaline synthase terminator region. 
It is further a feature of pDAB367 that any DNA fragment 
containing flanking Motl sites can be cloned into the 

15 unique Notl site of pDAB367, thus physically linking the 
introduced DNA fragment to the aforementioned selectable 
marker gene. 

To prepare a maize plant-expressible gene to produce 
the endoplasmic reticulum-targeted TcdA protein in plant 

20 cells, DNA of a plasmid (pA0H_4-ER) containing the plant- 
optimized, ER-targeted tcdA coding region, (SEQ ID No: 6) 
was cleaved with restriction enzymes Ncol and Sad, and 
the large 7 610 bp fragment was ligated to similarly-cut 
DNA of plasmid pDAB1538 to produce plasmid pDAB1832. DNA 

25 of pDAB1832 was then digested with Notl, and the 9984 bp 
Notl fragment was ligated into the unique Notl site of 
pDAB367 to produce plasmid pDAB1834. 

It is a feature of plasmids pDAB1834 that the ubil 
and 35S promoters are encoded on the same DNA strand. 

30 

B. Transformation and Regeneration of Transgenic Maize 
Isolates 

Type II callus cultures were initiated from immature 
zygotic embryos of the genotype ''Hi-II-'' (Armstrong et 
35 al, (1991) Maize Genet. Coop. Newslett., 65: 92-93). 
Embryos were isolated from greenhouse-grown ears from 

-34- 



0111029A1 I > 



wo 01/11029 



PCT/USOO/22237 



crosses between Hi-II parent A and Hi-II parent B or F2 
embryos derived from a self- or sib-pollination of a Hi- 
II plant. Immature embryos (1.5 to 3.5 mm) were cultured 
on initiation medium consisting of N6 salts and vitamins 
5 (Chu et al, (1978) The N6 medium and its application to 
anther culture of cereal crops. Proc. Symp. Plant Tissue 
Culture, Peking Press, 43-56), 1.0 mg/L 2,4-D, 25mM L- 
proline, 100 mg/L casein hydrolysate, 10 mg/L AgNOa, 2.5 
g/L GELRITE ( Schweizerhall , South Plainfield, NJ) , and 20 

10 g/L sucrose, with a pH of 5.8. After four to six weeks 
callus was subcultured onto maintenance medium 
(initiation medium in which AgNOa was omitted and L- 
proline was reduced to 6 mM) . Selection for Type II 
callus took place for ca. 12-16 weeks. 

15 Plasmid pDAB1834 was transformed into embryogenic 

callus. For blasting, 140 pg of plasmid DNA was 
precipitated onto 60 mg of alcohol-rinsed, spherical gold 
particles (1.5 - 3.0 \xm diameter, Aldrich Chemical Co., 
Inc., Milwaukee, WI) by adding 74 yL of 2 . 5M CaCla H2O and 

20 30 pL of O.IM spermidine (free base) to 300 \iL of plasmid 
DNA and H2O. The solution was immediately vortexed and 
the DNA-coated gold particles were allowed to settle. 
The resulting clear supernatant was removed and the gold 
particles were resuspended in 1 ml of absolute ethanol. 

25 This suspension was diluted with absolute ethanol to 
obtain 15 mg DNA-coated gold/mL. 

Approximately 600 mg of embryogenic callus tissue 
was spread over the surface of Type II callus maintenance 
medium as described herein lacking casein hydrolysate and 

30 L-proline, but supplemented with 0.2 M sorbitol and 0.2 M 
mannitol as an osmoticum. Following a 4 h pre-treatment , 
tissue was transferred to culture dishes containing 
blasting medium (osmotic media solidified with 20 g/L TC 
agar ( PhytoTechnology Laboratories, LLC, Shawnee Mission, 

35 KS) instead of 7 g/L GELRITE. Helium blasting 

accelerated suspended DNA-coated gold particles towards 
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and into the prepared tissue targets. The device used 
was an earlier prototype of that described in US Patent 
5,141,131 which is incorporated herein by reference. 
Tissues were covered with a stainless steel screen (104 
5 \im openings) and placed under a partial vacuum of 25 

inches of Hg in the device chamber. The DNA-coated gold 
particles were further diluted 1:1 with absolute ethanol 
prior to blasting and were accelerated at the callus 
targets four times using a helium pressure of 1500 psi, 

10 with each blast delivering 20 pL of the DNA/gold 

suspension- Immediately post-blasting, the tissue was 
transferred to osmotic media for a 16-24 h recovery 
period. Afterwards, the tissue was divided into small 
pieces and transferred to selection medium (maintenance 

15 medium lacking casein hydrolysate and L-proline but 

containing 30 mg/L BASTA® (AgrEvo, Berlin, Germany) ) . 
Every four weeks for 3 months, tissue pieces were non- 
selectively transferred to fresh selection medium. After 
7 weeks and up to 22 weeks, callus sectors found 

20 proliferating against a background of growth-inhibited 

tissue were removed and isolated. The resulting BASTA®- 
resistant tissue was subcultured biweekly onto fresh 
selection medium. Following western analysis, positive 
transgenic lines were identified and transferred to 

25 regeneration media. Western-negative lines underwent 
■ subsequent RNA spot blot analysis to identify negative 
controls for regeneration. 

Regeneration was initiated by transferring callus 
tissue to cytokinin-based induction medium, which 

30 consisted of Murashige and Skoog salts, hereinafter MS 

salts, and vitamins (Murashige and Skoog, (1962) Physiol. 
Plant. 15: 473-497) 30 g/L sucrose, 100 mg/L myo- 
inositol, 30 g/L mannitol, 5 mg/L 6-benzylaminopurine, 
hereinafter BAP, 0.025 mg/L 2,4-D, 30 mg/L BASTA®, and 

35 2.5 g/L GELRITE at pH 5,7. The cultures were placed in 

low light (125 ft-candles) for one week followed by one 
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week in high light (325 ft-candles) . Following a two 
week induction period, tissue was non-selectively 
transferred to hormone-free regeneration medium, which 
was identical to the induction medium except that it 
5 lacked 2,4-D and BAP, and was kept in high light. Small 
(1.5-3 cm) plantlets were removed and placed in 150x25 mm 
culture tubes containing SH medium (SH salts and vitamins 
(Schenk and Hildebrandt, (1972) Can. J. Bot . 50:199-204), 
10 g/L sucrose, 100 mg/L myo-inositol, 5 mL/L FeEDTA, and 

10 2.5 g/L GELRITE, pH 5.8). Plantlets were transferred to 
12 cm pots containing approximately 0.25 kg of METRO-MIX 
360 (The Scotts Co. Marysville, OH) in the greenhouse as 
soon as they exhibited growth and developed a sufficient 
root system. They were grown with a 16 h photoperiod 

15 supplemented by a combination of high pressure sodium and 
metal halide lamps, and were watered as needed with a 
combination of three independent Peters Excel fertilizer 
formulations (Grace-Sierra Horticultural Products 
Company, Milpitas, CA) . At the 6-8 leaf stage, plants 

20 were transplanted to five gallon pots containing 

approximately 4 kg METRO-MIX 360, and grown to, maturity. 

EXAMPLE 5 

Characterization Of Transgenic Maize Plants 
25 Expressing Photorhabdus Toxin That Confer Insect Control. 
A. Insect Bioassays 

A single leaf was sampled from each plant in each 
test. Eight, 1.4 cm disks were cut from the outer portion 
of each leaf (approximately 30cm long) avoiding the 
30 center vein. Each disk was placed individually into a 
well of a C-D International 128 well tray (Pitman, N J . ) 
into which 0.5 ml of a 1.6% aqueous agar solution had 
been previously pipetted. The solidified agar prevented 
the leaf disks from drying out. The adaxial surface of 
35 the disk was always oriented up. 
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Five neonate southern corn rootworms, Diabrotica 
undecimpunctata howardi, were placed on each disk and the 
wells were sealed with vented plastic lids. The assay 
was held at ll'^C and 40% RH. Larval mortality and live- 
5 weight data were collected after 3 days. Data were 

subjected to analysis of variance and Duncan's multiple 
range test (a = 0.05) (Proc GLM, SAS Institute Inc., Gary, 
NC). Weight data were transformed using a logarithmic 
function to correct a correlation between the magnitude 
10 of the mean and variance. 

TABLE 9 



Results of Maize Leaf-disk Test vs SCR 



Treatment 


Mean % Kill 
(Duncan' s) 


Mean Survival 
Weight (mg) 
(Duncan' s ) 


1834 - 11 


68 A^"^^ 


0 . 064 A 


1834 - 17 


44 B 


0.098 B 


1834 - 15 


26 BC 


0.127 C 


Hill control 


13 C 


0.161 C 



Note: Means followed by the same letter are not 



15 significantly different based on Duncan's multiple range 
test (alpha=0 . 05) . Insect groups weighing less than 0.1 
mg were set to 0.03 mg instead of zero to conduct a more 
conservative analysis. Mortality (arcsin (sqrt ) ) and 
weight (loglO) data were transformed for analyses. 

20 

The results shown in Table 9 demonstrated that two events 
expressing TcdA protein were statistically distinct from 
control lines bioassayed using SCR neonates by mortality and 
survival,, weight criteria. These results demonstrated that 
25 southern corn rootworm were functionally effected by feeding 
on maize plants containing and expressing the tcdA gene. 
Those plants from 1834-11 were used to generate progeny for 
testing of inheritability of transgene. 
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B. PRODUCTION AND PROGENY TEST OF tcdA TRANSGENIC MAIZE 

Origin and growth of progeny plants: Sibling plants 1834- 
11-07 and 1834-11-08, clonally derived by regeneration 
5 from the callus of transgenic maize event 1834-11, were 
transplanted to the greenhouse and pollinated with inbred 
OQ414. Seeds obtained from these crosses, comprising seed 
lots 1834-11-07A and 1834-11-08A, were planted in 
Rootrainers (1 }i inch x 2 inch x 8 inch deep, product 

10 #647, C. Hummert Intl., Earth City, Mo.) filled with 

Metro-Mix 360 soilless mix (Scotts Terra-Lite, available 
from Hummert Intl.) and top irrigated with Hoagland * s 
nutrient solution. (Hoagland»s solution contains 229 ppm 
nitrogen as nitrate, 24.6 ppm nitrogen as ammonium, 26 

15 ppm P, 157 ppm K, 187 ppm Ca, 49 ppm Mg. and 30 ppm Na . ) 

Greenhouse conditions for this trial were: 16 hour 
days, daylight supplemented by metal halide lamps as 
needed to achieve a minimum of 600 ?Einsteins/cm^ PAR, and 
ambient temperature 30 C days, 22 C nights. 

20 

Leaves were sampled for protein determination 
approximately one week after planting. Leaf bioassays 
were conducted 2-3 weeks after planting; root bioassays 
were initiated approximately 3 weeks post planting. 

25 

Protein analysis of progeny plants: Protein was extracted 
from leaf and root samples harvested from transgenic 
plants, line 1834-11 progenies, and non-transformed 
plants. Each sample was placed on a 1 . 6 x 4 cm piece of 

30 3M Whatman™ paper . The paper was folded lengthwise and 
inserted in a flexible straw. A volume of 350 \il of an 
extraction buffer (9.5 ml of 0.2 M NaH2P04, 15.5 ml of 0.2 
M Na2HP04, 2 ml of 0.5 M NaaEDTA, 100 ml of Triton X-100, 
1 ml of 10% Sarkosyl, 78 ml of beta-mercaptoethanol, H2O 

35 to bring total volume to 100 ml, 50 ^ig/ml Antipain, 50 
|ig/ml Leupeptin, 0.1 mM Chymostatin, 5 ^g/ml Pepstatin) 
was pipetted on to the paper. The straw containing the 
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sample was then passed through a rolling device used for 
squeezing the extract into a 1.5 ml microcentrifuge tube. 
The extract was centrifuged for 10 minutes at 14,000 rpm 
in an Eppendorf refrigerated micro-centrifuge. The 
5 supernatant was transferred into a new tube. The amount 
of the total extractable protein was determined using a 
standard BioRad Protein Analysis protocol (BioRad 
Laboratories, Hercules, CA) . 

The presence of the TcdA protein was visualized by 

10 Western blot analysis following a standard procedure for 
protein separation (Laemmli, 1970). A volume of twenty 
|al of extract was loaded in each well of 4-20% gradient 
polyacrylamide gel (Owl Scientific Co., MA) for 
electrophoresis. Subsequently, the protein was 

15 transferred onto a nitrocellulose membrane using a semi- 
dry electroblotter (Pharmacia LKB Biotechnology, 
Piscataway, NJ) . The membrane was incubated for one hour 
in TBST-M solution (10% milk in TEST solution; 25 mM Tris 
HCL pH 7.4, 1.36 mM NaCl, 2.7 mM KCl, 0.1% Tween 20). 

20 Thereafter, the primary antibody (Anti-TcdA in TBST-M) 

was added. After one hour, the membrane was washed with 
TBST for five minutes, three times. Then the secondary 
antibody solution (goat anti-rabbit IgG conjugated to 
horseradish peroxidase; Bio-Rad Laboratories, in TBST-M) 

25 was added to the membrane. After one hour of incubation, 
the membrane was washed with an excess amount of TBST for 
10 minutes, four times. The protein was visualized using 
the Super Signal® West Pico chemiluminescence method 
(Pierce Chemical Co., Rockford, XL). The protein blot 

30 was exposed on a Hyper-film (Amersham, Arlington Heights, 
ID and was developed within 3 minutes. The intensity of 
the protein band was measured using a densitometer 
(Molecular Dynamics Inc., Sunnyvale, CA) and compared to 
standards - 

35 Three of six plants from seed lot 1834-11-07A and 

three of six plants from seed lot 1834-11-08A produced 
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detectable levels of TcdA protein (Table 1) . 
Approximately 3.8 to 13,3 ppm of TcdA were detected in 
the leaf blades and 4.1 to 8.4 ppm were detected in the 
leaf tips of the protein-positive plants. The amounts of 
5 TcdA protein detected in the roots were slightly lower 
than those found in the leaves. 

Insect bioassavs with proaenv plants: Plants were 
selected for bioassay based on results from Western blot 

10 analysis- Twelve (12), 6.4 mm diameter leaf discs were 
cut from the youngest leaf of each 2 week old seedling. 
Each disc was placed in a well of a 128-well tray (CD 
International) containing approximately 0 . 5mL of a 
solidified 2% agar in water solution. Two neonate 

15 southern corn rootworm, DiaJbrotica undecijnpuncta ta 

howardi (Barber) (SCR) , were placed in each well with a 
leaf disc. Trays were covered with perforated lids and 
maintained under a controlled environment for 3 days (28 
C; 16 hours light: 8 hours dark; approx. 60% relative 

20 humidity) . Living, larvae from 4 leaf discs were pooled 
and weighed producing 3 weight determinations per plant. 
Average weights were calculated by dividing the pooled 
weight by the number of survivors. Differences in 
average weights of SCR fed leaf discs from protein 

25 positive and protein negative plants were assessed using 
analysis of variance on the natural log- transformed 
average weights (Minitab, v. 12.2, Minitab Inc., State 
College, PA) . 

30 Root bioassays were initiated approximately 1 week 

after the initiation of the leaf disc bioassays. 
Approximately 24h prior to ecloslon, . SCR eggs were 
suspended in a 0.15% solution of agar in water to a 
concentration of 100 eggs/ml. Plants were inoculated 

35 with SCR eggs by pipetting 2.0 ml of the egg suspension 

(ie., approximately 200 eggs) just below the soil surface 
at the base of each plant. Two weeks after inoculation, 
plants were removed from their Rootrainer pots, their 
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roots washed free of potting mix, and scored for rootworm 
damage based on a 1 (resistant) to 9 (susceptible) rating 
system (Welch, 1977) , The results of the root ratings 
were examined using non-parametric tests to determine if 
5 the distribution of root ratings from the protein 

positive plants was the same as the distribution of the 
ratings from the protein negative plants. Testing was 
done at the 5% significance level. (StatXact v. 3, CYTEL 
Software Corporation, Cambridge MA) 

10 

Results from leaf and root bioassays of tcdA protein 
positive and protein negative progeny plants are 
summarized in Table 10. The average weights of SCR 
larvae fed leaf discs from protein positive plants were 

15 significantly lower than those of larvae fed leaf discs 
from protein negative plants (F = 4.6; d.f. = 1, 34; P < 
0.001. The Kolmogorov-Smirnov 2 sample test (p=0.04) and 
the Wald Wolfowitz runs test (p=0.001) indicated that 
the protein positive and protein negative root rating 

20 distributions were not similar. The Wilcoxon- Mann- 
Whitney test (p=O.02O6) and the Normal Scores test 
(p=0.206) indicated that the average score for the 
protein positive plants was. lower than the average root 
rating from the protein negative plants., 

25 

Table 10. Protein analysis and insect bioassay results 
with progeny of TcdA transgenic maize. 



Plant 
Number 


TcdA 
Protein 


Leaf Disc 
Bioassay 
Avg. Wt. (mg) 


Root Bioassay 

Root Rating 
(1-9) 


1834-11-07A-30 


PRO- 


0 .190 


8 


1834-11-08A-21 


PRO- 


0 . 196 


9 


1834-11-08A-16 


PRO- 


0.195 


9 


1834-11-08A-14 


PRO- 


0.137 


9 


1834-11-07A-22 


PRO- 


0.208 


9 


1834-11-07A-20 


PRO- 


0.175 


9 
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1834-11-07A-26 


PRO+ 


0,118 


9 


1834-11-08A-17 


PRO+ 


0.132 


8 


1834-11-07A-14 


PRO+ 


0 .110 


2 


1834-11-07A-11 


PRO+ 


0 . 106 


4 


1834-11-08A-28 


PRO+ 


0.129 


8 


1834-11-08A-27 


PRO+ 


0 .108 


4 



DNA analysis of progeny plants: Leaf samples from 1834- 
11. 7A and 1834-11 -8A progeny plants were in conical 50 ml 
polypropylene tubes and dried in a Labconco Freeze Dry 
5 Lyophilizer (Kansas City, MO) for 1-2 days. Lyophilized 
leaves were then ground in a Tecator Cyclotee 1093 Sample 
mill grinder (Hoganas, Sweden) and stored at -20C. 
Genomic DNA was extracted by the following procedure: (1) 
to a 25 ml Conical tube containing 3 00-500 mg of ground 

10 tissue, 9 ml of CTAB (cetyl trimethyl ammonium bromide 

solution) was added, and incubated at GS^'C for 1 hour; (2) 
4.5 ml of chloroform: octanol (24:1) was added and mixed 
gently for 5 minutes; (3) samples were centrifuged at 
2000 rpm and DNA was precipitated from the supernatant 

15 with an equal volume of isopropanol; (4) DNA was 
collected on a glass hook, washed in ethanol, and 
dissolved in TE (10 mM Tris.HCl, 0 , 5 mM EDTA, pH8.0) . 

Genomic DNA was digested at 3 7 ^'C. for 2 hours in an 
20 Eppendorf tube containing the following mixture: 

8 |al of 800ug/ml DNA, 2 \il 1 mg/ml BSA (Bovine serum 
albumin), 2 |al lOx buffer, 1 ]il Sad, 1 |li1 EcoRI , and 6 ^il 
H20, Digested DNA samples were electrophoresed overnight 
at 40 mA in a 0.85% SeaKem LE agarose gel(FMC, Rockland, 
2 5 Maine) . The gel was blotted onto Millipore Immobilon-Ny+ 
(Bedford, MA) membrane overnight in 20X SSC '(NaCl 175.2 
g/1, Na citrate 88 g/1) . The probe DNA was cut with 
BamHI/SacI (NEB, Beverly, MA) from pDAB1551 plasmid, 
which released a 7356 bp fragment containing the open 
30 reading frame of the rebuilt tcdA gene. This 7356 bp 

fragment was labeled with P32 using a Stratagene Prime- it 
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10 



RmT dCTP-Labeling Reactions kit (La Jolla, CA) and used 
for Southern hybridization. Hybridization was conducted 
in hybridization buffer (10% polyethylene glycol, 7% SDS 
[Sodium dodecyl sulfate], 0 . 6X SSC, 10 mM NaPO^, 5 mM 
EDTA, 10 fag/ml denatured salmon sperm) at 60 ""C overnight. 
After hybridization, the membrane was washed with lOX SSC 
plus 0.1% SDS at 60 ^'C for 30 min and exposed to X ray 
film (Hyperfilm® MP, Amershan Life Sciences, Piscataway, 
NJ) for 1-2 days. 



Results summarized indicate that a pattern of 8 
hybridizing bands (the size of the expected fragment and 
larger) cosegregated with protein expression in 50% of 
all progeny assayed. These results are characteristic of 
15 a complex insertion at a single site. All seedlings 
containing the insert also expressed toxin protein. 

Example 6 

Transformation Of Rice With a Vector Carrying Plasmid 
20 pDAB1553 Encoding Photorhabdus Toxins 

A. Plasmid pDAB1553 

Plasmid pDAB1553 containing tcdA driven by the maize 

ubiquitinl promoter and hpt (hygromycin 

25 phosphotransferase providing resistance to the antibiotic 

hygromycin) under the control of 35T (a modified 35S 

promoter), was used for transformation. 

Preparation of rice transformation vectors was 
30 accomplished in two steps. First, a modified plant- 
optimized tcdA coding region was ligated into a rice 
plant expression cassette plasmid. In this step, the 
coding region was placed under the transcriptional 
control of a promoter functional in plant cells. RNA 
35 transcription termination and polyadenylation were 
mediated by a downstream copy of the terminator region 
from the Agrobacterium nopaline synthase gene. One 
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plasmid designed to function in this role is plasmid 
pDAB1538 (described in the section on maize 
transformation vectors) . In the second step, the 

complete gene comprised of the promoter, coding region, 
5 and terminator region was ligated to a rice plant 
transformation vector that contained a plant expressible 
selectable marker gene which allowed the selection of 
transformed rice plant cells amongst a background of 
nontransf ormed cells. An example of such a vector is 

10 pDAB354-Notl . 

It is a feature of pDAB354-Notl that the hygromycin 
phosphotransferase protein, which has as its substrate 
hygromycin B and related compounds, is produced in plant 
cells through transcription of its coding region mediated 

15 by the Cauliflower Mosaic Virus 35S promoter and that 
termination of transcription plus polyadenylat ion are 
mediated by the nopaline synthase terminator region. It 
is further a feature of pDAB354-Notl that any DNA 
fragment containing flanking iVotI sites can be cloned 

20 into the unique Notl site of pDAB354-Notl, thus 

physically linking the introduced DNA fragment to the 
aforementioned selectable marker gene. 

To prepare a plant-expressible gene to produce the 
non-targeted TcdA protein in rice plant cells, DNA of a 

25 plasmid (pA0H_4-OPTI) containing the plant-optimized tcdA 
coding region, (SEQ ID No: 3) was cleaved with restriction 
enzymes Ncol and Sad, and the large 7 550 bp fragment was 
ligated to similarly-cut DNA of plasmid pDAB1538 to 
produce plasmid pDAB1551. DNA of pDAB1551 was then 

30 digested with Notl, and the large 9933 bp fragment was 
ligated to Notl digested DNA of pDAB354-Notl to produce 
plasmid pDAB1553. 

It is a feature of plasmid pDAB1553 that the ubil 
and 35S promoters are encoded on the same DNA strand - 

35 B. Production of Rice transgenics 
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For initiation of embryogenic callus, mature seeds 
of a Japonlca cultivar, Taipei 309 were dehusked and 
surface-sterilized in 70% ethanol for 2-5 min. followed 
by a 30-45 min soak in 50% commercial bleach (2,6% sodium 
5 hypochlorite) with a few drops of 'Liquinox' soap. The 
seeds were then rinsed 3 times in sterile distilled water 
and placed on filter paper before transferring to 'callus 
induction' medium (i.e., NB) . The NB medium consisted of 
N6 macro elements (Chu, 1978, The N6 medium and its 
10 application to anther culture of cereal crops. Proc. 
Symp. Plant Tissue Culture, Peking Press, p43-56) , B5 
micro elements and vitamins (Gamborg et al., 1968, 
Nutrient requirements of suspension cultures of soybean 
root cells. Exp. Cell Res. 50: 151-158), 300 mg/L casein 
15 hydrolysate, 500 mg/L L-proline, 500 mg/L L-glutamine^ 30 
g/L sucrose, 2 mg/L 2 , 4 -dichloro-phenoxyacetic acid (2,4- 
D) , and 2.5 g/L gelrite ( Schweizerhall, NJ) with the pH 
adjusted to 5.8. The mature seed cultured on 'induction' 
media were incubated in the dark at 28^C. After 3 weeks 
20 of culture, the emerging primary callus induced from the 
scutellar region of mature embryo was transferred to 
fresh NB medium for further maintenance. 

About 140 pg of plasmid pDAB1553 DNA was 
precipitated onto 60 mg of 1.0 micron (Bio-Rad) gold 
25 particles as described herein. 

For helium blasting, actively growing embryogenic 
callus cultures, 2-4 mm in size, were subjected to a high 
osmoticum treatment. This treatment included placing of 
callus on NB medium with 0.2 M mannitol and 0.2 M 
30 sorbitol (Vain et al., 1993, Osmoticum treatment enhances 
particle bombardment-mediated transient and stable 
transformation of maize. Plant Cell Rep. 12: 84-88) for 
4 h before helium blasting. Following osmoticum 
treatment, callus cultures were transferred to 'blasting* 
'35 medium (NB+2% agar) and covered with a stainless steel 

screen (230 micron) . The callus cultures were blasted at 
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2,000 psi helium pressures twice per target. After 
blasting, callus was transferred back to the media with 
high osmoticum overnight before placing on selection 
medium, which consisted NB medium with 30 mg/L 
5 hygromycin. After 2 weeks, the cultures were transferred 
to fresh selection medium with a higher concentration of 
selection agent, i.e,, NB+50mg/L hygromycin (Li et al., 
1993, An improved rice transformation system using the 
biolistic method. Plant Cell Rep. 12: 250-255), 

10 Compact, white-yellow, embryogenic callus cultures, 

recovered on NB+50 mg/L hygromycin, were regenerated by 
transferring to ' pre-regeneration ' (PR) medium + 50 mg/L 
hygromycin. The PR medium consisted of NB medium with 2 
mg/L benzyl aminopurine (BAP), 1 mg/L naphthalene acetic 

15 acid (NAA) , and 5 mg/L abscisic acid (ABA) . After 2 

weeks of culture in the dark, they were transferred to 
'regeneration' (RN) medium . The composition of RN 
medium is NB medium with 3 mg/L BAP, and 0.5 mg/L NAA. 
The cultures on RN medium were incubated for 2 weeks at 

20 28^ C under high fluorescent light ( 325-f t-candles ) . The 
plantlets with 2 cm shoot were transferred to 1/2 MS 
medium (Murashige and Skoog, 1962, A revised medium for 
rapid growth and bioassays with tobacco tissue cultures. 
Physiol. Plant . 15 : 473-497) with 1/2 B5 vitamins, 10 g/L 

25 sucrose, 0.05 mg/L NAA, 50 mg/L hygromycin and 2.5 g/L 
gelrite adjusted to pH 5,8 in magenta boxes. When 
plantlets were established with well-developed root 
systems, they were transferred to soil (1 metromix: 1 top 
soil) and raised in the greenhouse (29/24^C day/night 

30 cycle, 50-60% humidity, 12 h photoperiod) until maturity. 



EXAMPLE 7 

Chacterization Of Transgenic Rice Plants Expressing 
35 Photorhabdus Toxin That Confer Insect Control. 
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A. Insect bioassays 

Insect bioassays were performed using leaf discs and 
shown to be highly effective in controlling Southern corn 
rootworm. Diabrotica undecimpunctata howardi eggs are 
5 obtained from French Ag Research and hatched in petri 
dishes held at 28,5°C and 40% RH. The aerial parts are 
sampled from the transgenic plants and placed, singly 
into inverted petri dishes (lOOxlSram) containing 15ml of 
1.6% aqueous agar in the bottom to provide humidity and 

10 filter paper in the top to absorb condensation. These 
preparations are infested with five neonate larvae per 
dish and held at 28.5'=^C and 40% RH for 3 days- Mortality 
and larval weights are recorded- Weight data were 
transformed using a logarithmic function to correct a 

15 correlation between the magnitude of the mean and 
variance . 



Table 11 



Treatment 


Average Survivor 
Weight in mg' 
(Duncan's 
Grouping) 


Presence TcdA greenhouse-grown 
plants (number of +/number of plants 
tested) 


GUS 
Control 


0.390 A 




1553-33 


0.170 BCD 




1553-44 


0.167 BCD 


+-H- 


1553-62 


0.125 CD 


+-H- 


1553-41 


0.100 D 


+-4-+ 



Note: Means followed by the same letter are 
not significantly different based on Duncan's 



20 multiple range test {alpha=0 . 05) . 

Insect groups weighing less than 0 . 1 mg were set to 0.03 mg 

instead of zero to conduct a more conservative analysis. 

Weight data were transformed (LoglO) for analyses. A single 

replicate was used on each of three test dates. Plants were 

25 sampled from magenta boxes. 

The results demonstrate that in leaf disc bioassays, several 

rice events derived by transformation with. tcdA gene were 

demonstrated to statistically have a functional affect on 

corn rootworm neonate. 

30 
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Claims 

1. An isolated nucleic acid of SEQ ID NO: 3 or SEQ ID 
NO : 4 . 

2. A transgenic monocot cell having a genome comprising 
5 SEQ ID NO: 3 or SE^ ID NO: 4. 

3. A transgenic dicot cell having a genome comprising 
SEQ ID NO: 3 or SEQ ID NO: 4. 

4. A transgenic plant with a genome comprising a 
nucleic acid of SEQ ID NO: 3 or SEQ ID NO: 4 that imparts 

10 insect resistance. 

5. A transgenic plant of claim 4 wherein the plant is 
rice. 

6. A transgenic plant of claim 4 wherein the plant is 
maize • 

15 7. A transgenic plant of claim 4 wherein the plant is 
tobacco. 
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SEQUENCE LISTING 

<110> Petell, Jim 

Merlo, Donald 
Herman^ Rod 
Roberts, Jean 
Guo, Lining 
Schafer, Barry 
Sukhapinda, Kitisri 
Owens Merio, Ann 

<120> Transgenic Plants Expressing Photorhabdus Toxin 

<130> 50698 

<140> 
<141> 

<150> US 60/148,356 

<151> 1999-08-11 

<160> 8 

<170> Patentin Ver. 2.0 

<210> 1 

<211> 7551 

<212> DNA 

<213> Photorhabdus luminescens 

<220> 

<221> CDS 

<222> (1) . . (7548) 

<400> 1 

atg aac gag tct gta aaa gag ata cct gat gta tta aaa age cag tgt 
Met Asn Glu Ser Val Lys Glu lie Pro Asp Val Leu Lys Ser Gin Cys 
15 10 15 

ggt ttt aat tgt ctg aca gat att age cac age tet ttt aat gaa ttt 
Gly Phe Asn Cys Leu Thr Asp lie Ser His Ser Ser Phe Asn Glu Phe 
20 . 25 30 

cge eag eaa gta tct gag cac etc tec tgg tee gaa aca cac gac tta 
Arg Gin Gin Val Ser Glu His Leu Ser Trp Ser Glu Thr His Asp Leu 
35 40 45 

tat cat gat gca eaa eag gea eaa aag gat aat cge ctg tat gaa gcg 
Tyr His Asp Ala Gin Gin Ala Gin Lys Asp Asn Arg Leu Tyr Glu Ala 
50 55 60 



48 



96 



144 



192 



cgt att etc aaa cge gee aat ccc caa tta caa aat gcg g4:g cat ett 240 
Arg lie Leu Lys Arg Ala Asn Pro Gin Leu Gin Asn Ala Val His Leu 
65 70 75 80 



gee att etc get ccc aat get gaa ctg ata ggc tat aac aat caa ttt 
Ala lie Leu Ala Pro Asn Ala Glu Leu He Gly Tyr Asn Asn Gin Phe 
. 85 90 95 



288 



age ggt aga gee agt eaa tat gtt gcg ccg ggt ace gtt tet tee atg 336 
Ser Gly Arg Ala Ser Gin Tyr Val Ala Pro Gly Thr Val Ser Ser Met 



1 
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100 105 110 

ttc tec ccc gcc get tat ttg act gaa ctt tat egt gaa gca cgc aat 384 
Phe Ser Pro Ala Ala Tyr Leu Thr Glu Leu Tyr Arg Glu Ala Arg Asn 
115 120 125 

tta cac gca agt gac tec gtt tat tat ctg gat ace cgc cgc cca gat 4 32 
Leu His Ala Ser Asp Ser Val Tyr Tyr Leu Asp Thr Arg Arg Pro Asp 
130 135 140 

etc aaa tea atg gcg etc agt cag eaa aat atg gat ata gaa tta tec 480 
Leu Lys Ser Met Ala Leu Ser Gin Gin Asn Met Asp lie Glu Leu Ser 
145 150 155 160 

aca ccc tct ttg tec aat gag ctg tta ttg gaa age att aaa act gaa 528 
Thr Leu Ser Leu Ser Asn Glu Leu Leu Leu Glu Ser lie Lys Thr Glu 
165 170 175 

tct aaa ctg gaa aac tat act aaa gtg atg gaa atg etc tec act ttc 576 
Ser Lys Leu Glu Asn Tyr Thr Lys Val Met Glu Met Leu Ser Thr Phe 
180 185 190 

egt cet tec ggc gca aeg cet tat eat gat get tat gaa aat gtg egt 624 
Arg Pro Ser Gly Ala Thr Pro Tyr His Asp Ala Tyr Glu Asn Val Arg 
195 200 205 

gaa gtt ate cag eta eaa gat cet gga ctt gag caa etc aat gca tea 672 
Glu Val lie Gin Leu Gin Asp Pro Gly Leu Glu Gin Leu Asn Ala Ser 
210 215 220 

ccg gca att gcc ggg ttg atg cat caa gcc tec eta ttg ggt att aac 720 
Pro Ala lie Ala Gly Leu Met His Gin Ala Ser Leu Leu Gly lie Asn 
225 230 235 240 

get tea ate teg cet gag eta ttt aat att ctg aeg gag gag att ace 7 68 
Ala Ser lie Ser Pro Glu Leu Phe Asn lie Leu Thr Glu Glu lie Thr 
245 250 255 

gaa ggt aat get gag gaa ctt tat aag aaa aat ttt ggt aat ate gaa 816 
Glu Gly Asn Ala Glu Glu Leu Tyr Lys Lys Asn Phe Gly Asn lie Glu 
260 265 270 

ccg gcc tea ttg get atg ccg gaa tac ctt aaa egt tat tat aat tta 864 
Pro Ala Ser Leu Ala Met Pro Glu Tyr Leu Lys Arg Tyr Tyr Asn Leu 
275 280 285 

age gat gaa gaa ctt agt cag ttt att ggt aaa gcc age aat ttt ggt 912 
Ser Asp Glu Glu Leu Ser Gin Phe lie Gly Lys Ala Ser Asn Phe Gly 
290 295 300 

caa cag gaa tat agt aat aac caa ctt att act ccg gta gtc aac age 960 
Gin Gin Glu Tyr Ser Asn Asn Gin 'Leu lie Thr Pro Val Val Asn Ser 
305 310 315 320 

agt gat ggc aeg gtt aag gta tat egg ate ace cgc gaa tat aca ace 1008 
Ser Asp Gly Thr Val Lys Val Tyr Arg lie Thr Arg Glu Tyr Thr Thr 
325 330 335 

aat get tat caa atg gat gtg gag eta ttt ccc ttc ggt ggt gag aat 1056 
Asn Ala Tyr Gin Met Asp Val Glu Leu Phe Pro Phe Gly Gly Glu Asn 
340 345 350 
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tat egg tta gat tat aaa ttc aaa aat ttt tat aat gcc tct tat tta 1104 
Tyr Arg Leu Asp Tyr Lys Phe Lys Asn Phe Tyr Asn Ala Ser Tyr Leu 
355 360 365 

tec ate aag tta aat gat aaa aga gaa ctt gtt cga act gaa ggc get 1152 
Ser lie Lys Leu Asn Asp Lys Arg Glu Leu Val Arg Thr Glu Gly Ala 
370 375 380 

cot caa gtc aat ata gaa tac tec gca aat ate aca tta aat ace get 1200 
Pro Gin Val Asn He Glu Tyr Ser Ala Asn He Thr Leu Asn Thr Ala 
385 390 395 400 

gat ate agt caa cct ttt gaa att ggc ctg aca cga gta ctt cct tec 1248 
Asp He Ser Gin Pro Phe Glu He Gly Leu Thr Arg Val Leu Pro Ser 
405 410 415 

ggt tet tgg gea tat gee gcc gca aaa ttt ace gtt gaa gag tat aac 1296 
Gly Ser Trp Aia^ Tyr Ala Ala Ala Lys Phe Thr Val Glu Glu Tyr Asn ' 
420 425 430 

caa tac tct ttt ctg eta aaa ctt aac aag get att cgt eta tea cgt 134 4 
Gin Tyr Ser Phe Leu Leu Lys Leu Asn Lys Ala He Arg Leu Ser Arg 
435 440 445 

gcg aca gaa ttg tea cec acg att ctg gaa ggc att gtg cgc agt gtt 1392 
Ala Thr Glu Leu Ser Pro Thr He Leu Glu Gly He Val Arg Ser Val 
450 455 460 

aat eta caa ctg gat ate aac aca gae gta tta ggt aaa gtt ttt ctg 1440 
Asn Leu Gin Leu Asp He Asn Thr Asp Val Leu Gly Lys Val Phe Leu 
465 470 475 480 

act aaa tat tat atg cag cgt tat get att eat get gaa act gee ctg 14 88 
Thr Lys Tyr Tyr Met Gin Arg Tyr Ala He His Ala Glu Thr Ala Leu 
485 490 495 

ata eta tgc aac gcg cct att tea caa cgt tea tat gat aat caa cct 1536 
He Leu Cys Asn Ala Pro He Ser Gin Arg Ser Tyr Asp Asn Gin Pro 
500 505 510 



age caa ttt gat cgc ctg ttt aat aeg cca tta ctg aac gga caa tat 
Ser Gin Phe Asp Arg Leu Phe Asn Thr Pro Leu Leu Asn Gly Gin Tyr 
515 520 525 



1584 



ttt tct acc ggc gat gag gag att gat tta aat tea ggt age ace ggc 1632 
Phe Ser Thr Gly Asp Glu Glu He Asp Leu Asn Ser Gly Ser Thr Gly 
530 535 540 

gat tgg cga aaa ace ata ctt aag cgt gca ttt aat att gat gat gtc 1680 
Asp Trp Arg Lys Thr He Leu Lys Arg Ala Phe Asn He Asp Asp Val 
545 ' 550 555 560 

teg etc ttc cgc ctg ctt aaa att acc gac cat gat aat aaa gat gga 
Ser Leu Phe Arg Leu Leu Lys He Thr Asp His Asp Asn Lys Asp Gly 
565 570 575 

aaa att aaa aat aac eta aag aat ctt tec aat tta tat att gga aaa 1776 
Lys He Lys Asn Asn Leu Lys Asn Leu Ser Asn Leu Tyr He Gly Lys 
580 585 590 



1728 
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tta ctg gca gat att cat caa tta acc att gat gaa ctg gat tta tta 
Leu Leu Ala Asp He His Gin Leu Thr lie Asp Glu Leu Asp Leu Leu 
595 600 605 



i824 



ctg att gcc gta ggt gaa gga aaa act aat tta tec get ate agt gat 1872 
Leu lie Ala Vai Gly Glu Gly Lys Thr Asn Leu Ser Ala lie Ser Asp 
610 615 620 

aag caa ttg get acc ctg ate aga aaa etc aat act att ace age tgg 1920 
Lys Gin Leu Ala Thr Leu He Arg Lys Leu Asn Thr lie Thr Ser Trp 
625 630 635 640 

eta cat aca cag aag tgg agt gta ttc cag eta ttt ate atg acc tec 1968 
Leu His Thr Gin Lys Trp Ser Vai Phe Gin Leu Phe He Met Thr Ser 
645 650 655 



acc age tat aac aaa acg eta acg cct gaa att aag aat ttg ctg gat 
Thr Ser Tyr Asn Lys Thr Leu Thr Pro Glu He Lys Asn Leu Leu Asp 
660 665 670 



2016 



ace gtc tac cac ggt tta caa ggt ttt gat aaa gac aaa gca gat ttg 2064 
Thr Vai Tyr His Gly Leu Gin Gly Phe Asp Lys Asp Lys Ala Asp Leu 
675 680 685 

eta eat gtc atg gcg cce tat att gcg gee acc ttg caa tta tea teg 21.12 
Leu His Vai Met Ala Pro Tyr He Ala Ala Thr Leu Gin Leu Ser Ser 
690 695 700 

gaa aat gtc gcc cac teg gta etc ctt tgg gca gat aag tta cag cce 
Glu Asn Vai Ala His Ser Vai Leu Leu Trp Ala Asp Lys Leu Gin Pro 
705 710 715 720 

ggc gac ggc gca atg aca gca gaa aaa ttc tgg gac tgg ttg aat act 
Gly Asp Gly Ala Met Thr Ala Glu Lys Phe Trp Asp Trp Leu Asn Thr 
725 730 735 

aag tat acg ceg ggt tea teg gaa gee gta gaa acg cag gaa cat ate 
Lys Tyr Thr Pro Gly Ser Ser Glu Ala Vai Glu Thr Gin Glu His lie 
740 745 750 

gtt cag tat tgt cag get ctg gca caa ttg gaa atg gtt tac cat tec 
Vai Gin Tyr Cys Gin Ala Leu Ala Gin Leu Glu Met Vai Tyr His Ser 
755 760 765 

acc ggc ate aac gaa aac gcc ttc cgt eta ttt gtg aca aaa cca gag 2352 
Thr Gly He Asn Glu Asn Ala Phe Arg Leu Phe Vai Thr Lys Pro Glu 

770 775 780 

atg ttt ggc get gca act gga gca gcg cce gcg cat gat gcc ctt tea 2400 
Met Phe Gly Ala Ala Thr Gly Ala Ala Pro Ala His Asp Ala Leu Ser 
785 790 795 800 

ctg att atg ctg aca cgt ttt gcg gat tgg gtg aac gca eta ggc gaa 
Leu He Met Leu Thr Arg Phe Ala Asp Trp Vai Asn Ala Leu Gly Glu 
805 • 810 815 

aaa gcg tec teg gtg eta gcg gca ttt gaa get aac teg tta acg gca 24 96 
Lys Ala Ser Ser Vai Leu Ala Ala Phe Glu Ala Asn Ser Leu Thr Ala 

820 825' 830 



2160 



2208 



2256 



2304 



2448 



gaa caa ctg get gat gcc atg aat ctt gat get aat ttg ctg ttg caa 



2544 
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Glu Gin Leu Ala Asp Ala Met Asn Leu Asp Ala Asn Leu Leu Leu Gin 

835 840 845 

gcc agt att caa gca caa aat cat caa cat ctt ccc oca gta act cca 2592 

Ala Ser He Gin Ala Gin Asn His Gin His Leu Pro Pro Val Thr Pro 

850 855 860 

gaa aat gcg ttc tec tgt tgg aca tct ate aat act ate ctg caa tgg 2640 

Glu Asn Ala Phe Ser Cys Trp Thr Ser He Asn Thr He Leu Gin Trp 

865 870 875 880 

gtt aat gtc gca caa caa ttg aat gtc gcc cca cag ggc gtt tec get 2688 

Val Asn Val Ala Gin Gin Leu Asn Val Ala Pro Gin Gly Val Ser Ala 
885 890 895 



ttg gtc ggg ctg gat tat att caa tea atg aaa gag aca ccg acc tat 
Leu Val Gly Leu Asp Tyr He Gin Ser Met Lys Glu Thr Pro Thr Tyr 
900 905 910 

gcc cag tgg gaa aac gcg gca ggc gta tta acc gcc ggg ttg aat tea 
Ala Gin Trp Glu Asn Ala Ala Gly Val Leu Thr Ala Gly Leu Asn Ser 
915 920 925 

caa cag get aat aca tta cac get ttt ctg gat gaa tct cgc agt gcc 
Gin Gin Ala Asn Thr Leu His Ala Phe Leu Asp Glu Ser Arg Ser Ala 
930 935 940 

gca tta age acc tac tat ate cgt caa gtc gcc aag gca gcg gcg get 
Ala Leu Ser Thr Tyr Tyr He Arg Gin Val Ala Lys Ala Ala Ala Ala 
945 950 955 960 

att aaa age cgt gat gac ttg tat caa tac eta ctg att gat aat cag 
He Lys Ser Arg Asp Asp Leu Tyr Gin Tyr Leu Leu He Asp Asn Gin 
965 970 975 

gtt tct gcg gca ata aaa acc acc egg ate gee gaa gee att gee agt 
Val Ser Ala Ala He Lys Thr Thr Arg He Ala Glu Ala He Ala Ser 
980 " 985 990 

att caa ctg tac gtc aac egg gca ttg gaa aat gtg gaa gaa aat gcc 
He Gin Leu Tyr Val Asn Arg Ala Leu Glu Asn Val Glu Glu Asn Ala 
995 1000 1005 

aat teg ggg gtt ate age cgc caa ttc ttt ate gac tgg gac aaa tac 
Asn Ser Gly Val He Ser Arg Gin Phe Phe He Asp Trp Asp Lys Tyr 
1010 1015 1020 

aat aaa cgc tac age act tgg gcg ggt gtt tct caa tta gtt tac tac 
Asn Lys Arg Tyr Ser Thr Trp Ala Gly Val Ser Gin Leu Val Tyr Tyr 
1025 1030 1035 1040 

ccg gaa aac tat att gat ccg acc atg cgt ate gga caa acc aaa atg 
Pro Glu Asn Tyr He Asp Pro Thr Met Arg He Gly Gin Thr Lys Met 
1045 1050 1055 

atg gac gca tta ctg caa tec gtc age caa age caa tta aac gcc gat 
Met Asp Ala Leu Leu Gin Ser Val Ser Gin Ser Gin Leu Asn Ala Asp 
1060 1065 1070 

acc gtc gaa gat gcc ttt atg tct tat ctg aca teg ttt gaa caa gtg 
Thr Val Glu Asp Ala Phe Met Ser Tyr Leu Thr Ser Phe Glu Gin Val 



2736 



2784 



2832 



2880 



2928 



2976 



3024 



3072 



3120 



3168 
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1075 1080 1085 

get aat ctt aaa gtt att age gca tat cac gat aat att aat aac gat 3312 
Ala Asn Leu Lys Val lie Ser Ala Tyr His Asp Asn lie Asn Asn Asp 
1090 1095 1100 

caa ggg ctg acc tat ttt ate gga etc agt gaa act gat gee ggt gaa 3360 
Gin Gly Leu Thr Tyr Phe lie Gly Leu Ser Glu Thr Asp Ala Gly Glu 
1105 1110 1115 1120 

tat tat tgg cgc agt gtc gat eae agt aaa ttc aac gae ggt aaa tte 3408 
Tyr Tyr Trp Arg Ser Val Asp His Ser Lys Phe Asn Asp Gly Lys Phe 
1125 1130 1135 

geg get aat gee tgg agt gaa tgg cat aaa att gat tgt cca att aac 34 56 
Ala Ala Asn Ala Trp Ser Glu Trp His Lys He Asp Cys Pro He Asn 
1140 1145 1150 

cct tat aaa age act ate egt cca gtg ata tat aaa tec cgc ctg tat 3504 
Pro Tyr Lys Ser Thr He Arg Pro Val He Tyr Lys Ser Arg Leu Tyr 
1155 1160 1165 

ctg etc tgg ttg gaa caa aag gag ate acc aaa cag aca gga aat agt 3552 
Leu Leu Trp Leu Glu Gin Lys Glu He Thr Lys Gin Thr Gly Asn Ser 
1170 1175 1180 

aaa gat gge tat caa act gaa aeg gat tat egt tat gaa eta aaa ttg 3600 
Lys Asp Gly Tyr Gin Thr Glu Thr Asp Tyr Arg Tyr Glu Leu Lys Leu 
1185 1190 1195 1200 

geg cat ate cgc tat gat gge act tgg aat aeg cca ate ace ttt gat 3648 
Ala His He Arg Tyr Asp Gly Thr Trp Asn Thr Pro He Thr Phe Asp 
1205 1210 1215 

gtc aat aaa aaa ata tec gag eta aaa ctg gaa aaa aat aga geg cec 3696 
Val Asn Lys Lys He Ser Glu Leu Lys Leu Glu Lys Asn Arg Ala Pro 
1220 1225 1230 

gga etc tat tgt gcc ggt tat caa ggt gaa gat aeg ttg ctg gtg atg 3744 
Gly Leu Tyr Cys Ala Gly Tyr Gin Gly Glu Asp Thr Leu Leu Val Met 
1235 . 1240 1245 

ttt tat aac caa caa gae aca eta gat agt tat aaa aac get tea atg 3792 
Phe Tyr Asn Gin Gin Asp Thr Leu Asp Ser Tyr Lys Asn Ala Ser Met 
1250 1255 1260 

caa gga eta tat ate ttt get gat atg gca tec aaa gat atg acc cca 3840 
Gin Gly Leu Tyr He Phe Ala Asp Met Ala Ser Lys Asp Met Thr Pro 
1265 1270 1275 1280 

gaa cag age aat gtt tat egg gat aat age tat caa caa ttt gat acc 3888 
Glu Gin Ser Asn Val Tyr Arg Asp Asn Ser Tyr Gin Gin Phe Asp Thr 
1285 1290 1295 

aat aat gtc aga aga gtg aat aac cgc tat gca gag gat tat gag att 3936 
Asn Asn Val Arg Arg Val Asn Asn Arg Tyr Ala Glu Asp Tyr Glu He 
1300 1305 1310 

cct tec teg gta agt age egt aaa gae tat ggt tgg gga gat tat tac 3984 
Pro Ser Ser Val Ser Ser Arg Lys Asp Tyr Gly Trp Gly Asp Tyr Tyr 
1315 1320 1325 
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etc age atg gta tat aac gga gat att cca act ate aat tac aaa gcc 4032 
Leu Ser Met Val Tyr Asn Gly Asp lie Pro Thr lie Asn Tyr Lys Ala 
1330 1335 1340 

gca tea agt gat tta aaa ate tat ate tea eca aaa tta aga att att 4080 
Ala Ser Ser Asp Leu Lys lie Tyr He Ser Pro Lys Leu Arg He He 
1345 1350 1355 1360 

cat aat gga tat gaa gga cag aag egc aat caa tgc aat ctg atg aat 4128 
His Asn Gly Tyr Glu Gly Gin Lys Arg Asn Gin Cys Asn Leu Met Asn 
1365 1370 1375 

aaa tat gge aaa eta ggt gat aaa ttt att gtt tat act age ttg ggg 4176 
Lys Tyr Gly Lys Leu Gly Asp Lys Phe He Val Tyr Thr Ser Leu Gly 
1380 1385 1390 

gtc aat eca aat aac teg tea aat aag etc atg ttt tae ecc gte tat 4224 
Val Asn Pro Asn Asn Ser Ser Asn Lys Leu Met Phe Tyr Pro Val Tyr 
1395 1400 1405 

caa tat age gga aac aee agt gga etc aat caa ggg aga eta eta tte 4272 
Gin Tyr Ser Gly Asn Thr Ser Gly Leu Asn Gin Gly Arg Leu Leu Phe 
1410 1415 1420 

eae cgt gac aec act tat eca tct aaa gta gaa get tgg att cet gga 4320 
His Arg Asp Thr Thr Tyr Pro Ser Lys Val Giu Ala Trp He Pro Gly 
1425 1430 1435 1440 



gca aaa cgt tct eta ace aac caa aat gcc gcc att ggt gat gat tat 
Ala Lys Arg Ser Leu Thr Asn Gin Asn Ala Ala He Gly Asp Asp Tyr 
1445 1450 1455 

get aca gac tct ctg aat aaa eeg gat gat ett aag caa tat ate ttt 
Ala Thr Asp Ser Leu Asn Lys Pro Asp Asp Leu Lys Gin Tyr He Phe 
1460 1465 1470 

atg act gac agt aaa ggg act get act gat gtc tea ggc eca gta gag 
Met Thr Asp Ser Lys Gly Thr Ala Thr Asp Val Ser Gly Pro Val Glu 
1475 1480 , 1485 

att aat act gca att tct cca gca aaa gtt cag ata ata gtc aaa gcg 
He Asn Thr Ala He Ser Pro Ala Lys Val Gin He He Val Lys Ala 
1490 1495 " 1500 

ggt ggc aag gag caa act ttt acc gca gat aaa gat gtc tec att cag 
Gly Gly Lys Glu Gin Thr Phe Thr Ala Asp Lys Asp Val Ser He Gin 
1505 1510 1515 1520 

cca tea cet age ttt gat gaa atg aat tat caa ttt aat gee ett gaa 
Pro Ser Pro Ser Phe Asp Glu Met Asn Tyr Gin Phe Asn Ala Leu Glu 
1525 1530 1535 

ata gac ggt tct ggt ctg aat ttt att aac aac tea gcc agt att gat 
He Asp Gly Ser Gly Leu Asn Phe He Asn Asn Ser Ala Ser He Asp 
1540 1545 1550 

gtt act ttt acc gca ttt gcg gag gat ggc egc aaa ctg ggt tat gaa 
Val Thr Phe Thr Ala Phe Ala Glu Asp Gly Arg Lys Leu Gly Tyr Glu 
1555 1560 1565 



4368 



4416 
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agt ttc agt att cct gtt acc etc aag gta agt acc gat aat gcc ctg 4752 
Ser Phe Ser lie Pro Val Thr Leu Lys Val Ser Thr Asp Asn Ala Leu 
1570 1575 1580 

acc ctg cac cat aat gaa aat ggt gcg caa tat atg caa tgg caa tec 4800 
Thr Leu His His Asn Glu Asn Gly Ala Gin Tyr Met Gin Trp Gin Ser 
1585 1590 1595 1600 

tat cgt acc cgc ctg aat act eta ttt gcc cgc cag ttg gtt gea cgc 4848 
Tyr Arg Thr Arg Leu Asn Thr Leu Phe Ala Arg Gin Leu Val Ala Arg 
1605 1610 1615 

gcc acc acc gga ate gat aca att ctg agt atg gaa act cag aat att 4896 
Ala Thr Thr Gly lie Asp Thr He Leu Ser Met Glu Thr Gin Asn He 
1620 1625 1630 

cag gaa ecg cag tta ggc aaa ggt ttc tat get aeg ttc gtg ata cct 4 94 4 
Gin Glu Pro Gin Leu Gly Lys Gly Phe Tyr Ala Thr Phe Val He Pro 
1635 1640 1645 

ecc tat aac eta tea act cat ggt gat gaa cgt tgg ttt aag ett tat 4992 
Pro Tyr Asn Leu Ser Thr His Gly Asp Glu Arg Trp Phe Lys Leu Tyr 
1650 1655 1660 

ate aaa cat gtt gtt gat aat aat tea cat att ate tat tea ggc cag 5040 
He Lys His Val Val Asp Asn Asn Ser His He He Tyr Ser Gly Gin 
1665 1670 1675 1680 

eta aca gat aca aat ata aac ate aca tta ttt att cct ett gat gat 5088 
Leu Thr Asp Thr Asn He Asn He Thr Leu Phe He Pro Leu Asp Asp 
1685 1690 1695 

gtc cea ttg aat caa gat tat cac gcc aag gtt tat atg acc ttc aag 5136 
Val Pro Leu Asn Gin Asp Tyr His Ala Lys Val Tyr Met Thr Phe Lys 
1700 1705 1710 

aaa tea eca tea gat ggt ace tgg tgg ggc cct cac ttt gtt aga gat 5184 
Lys Ser Pro Ser Asp Gly Thr Trp Trp Gly Pro His Phe Val Arg Asp 
1715 1720 1725 

gat aaa gga ata gta aca ata aac cct aaa tec att ttg acc cat ttt 5232 
Asp Lys Gly He Val Thr He Asn Pro Lys Ser He Leu Thr His Phe 
1730 1735 1740 

gag age gtc aat gtc ctg aat aat att agt age gaa cca atg gat ttc 5280 
Glu Ser Val Asn Val Leu Asn Asn He Ser Ser Glu Pro Met Asp Phe 
1745 1750 1755 1760 

age ggc get aac age etc tat ttc tgg gaa ctg ttc tac tat acc ecg 5328 
Ser Gly Ala Asn Ser Leu Tyr Phe Trp Glu Leu Phe Tyr Tyr Thr Pro 
1765 1770 1775 

atg ctg gtt get caa cgt ttg ctg cat gaa cag aac ttc gat gaa gee 5376 
Met Leu Val Ala Gin Arg Leu Leu His Glu Gin Asn Phe Asp Glu Ala 
1780 1785 1790 

aac cgt tgg ctg aaa tat gtc tgg agt cca tec ggt tat att gtc cac 5424 
Asn Arg Trp Leu Lys Tyr Val Trp Ser Pro Ser Gly Tyr He Val His 
1795 1800 1805 

ggc cag att cag aac tac cag tgg aac gtc cgc ecg tta ctg gaa gae 54 72 
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Gly Gin lie Gin Asn Tyr Gin Trp Asn Val Arg Pro Leu Leu Glu Asp 
1810 1815 1820 

acc agt tgg aac agt gat cct ttg gat tec gtc gat cct gac gcg gta 

Thr Ser Trp Asn Ser Asp Pro Leu Asp Ser Val Asp Pro Asp Ala Val 

1825 1830 1835 1840 

gca cag cac gat cca atg cac tac aaa gtt tea act ttt atg cgt acc 

Ala Gin His Asp Pro Met His Tyr Lys Val Ser Thr Phe Met Arg Thr 

1845 1850 1855 



5520 



5568 



ttg gat eta ttg ata gca cgc ggc gac cat get tat egc caa ctg gaa 
Leu Asp Leu Leu lie Ala Arg Gly Asp His Ala Tyr Arg Gin Leu Glu 
I860 1865 1870 

cga gat aca etc aac gaa gcg aag atg tgg tat atg caa gcg ctg cat 
Arg Asp Thr Leu Asn Glu Ala Lys Met Trp Tyr Met Gin Ala Leu His 
1875 1880 1885 

eta tta ggt gac aaa cct tat eta ccg ctg agt aeg aca tgg agt gat 
Leu Leu Gly Asp Lys Pro Tyr Leu Pro Leu Ser Thr Thr Trp Ser Asp 
1890 1895 1900 

cca cga eta gac aga gee gcg gat ate act acc caa aat get cac gac 
Pro Arg Leu Asp Arg Ala Ala Asp lie Thr Thr Gin Asn Ala His Asp 
1905 1910 1915 1920 

age gca ata gtc get ctg egg cag aat ata cct aca ccg gca cct tta 
Ser Ala lie Val Ala Leu Arg Gin Asn lie Pro Thr Pro Ala Pro Leu 
1925 1930 1935 



5616 



5664 



5712 



5760 



5808 



tea ttg cgc age get aat ace ctg act gat etc ttc ctg ccg caa ate 

Ser Leu Arg Ser Ala Asn Thr Leu Thr Asp Leu Phe Leu Pro Gin lie 
1940 . 1945 1950 

aat gaa gtg atg atg aat tac tgg cag aca tta get cag aga gta tac 

Asn Glu Val Met Met Asn Tyr Trp Gin Thr Leu Ala Gin Arg Val Tyr 
1955 1960 1965 



5856 



5904 



aat ctg cgt cat aac etc tet ate gac ggc cag ccg tta tat ctg cca 
Asn Leu Arg His Asn Leu Ser lie Asp Gly Gin Pro Leu Tyr Leu Pro 
1970 1975 1980 

ate tat gee aca ccg gee gat ccg aaa gcg tta etc age gee gee gtt 
lie Tyr Ala Thr Pro Ala Asp Pro Lys Ala Leu Leu Ser Ala Ala Val 
1985 1990 1995 2000 

gee act tet caa ggt gga ggc aag eta ccg gaa tea ttt atg tec ctg 
Ala Thr Ser Gin Gly Gly Gly Lys Leu Pro Glu Ser Phe Met Ser Leu 
2005 2010 2015 



5952 



6000 



6048 



tgg cgt ttc ccg cac atg ctg gaa aat gcg'cgc ggc atg gtt age cag 
Trp Arg Phe Pro His Met Leu Glu Asn Ala Arg Gly Met Val Ser Gin 
2020 2025 2030 



6096 



etc ace cag ttc ggc tec aeg tta caa aat att ate gaa cgt cag gac 
Leu Thr Gin Phe Gly Ser Thr Leu Gin Asn lie He Glu Arg Gin Asp 
2035 2040 2045 



6144 



gcg gaa gcg etc aat gcg tta tta caa aat cag gee gee gag ctg ata 
Ala Glu Ala Leu Asn Ala Leu Leu Gin Asn Gin Ala Ala Glu Leu He 



6192 
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2050 



2055 



2060 



ttg act aac ctg age att cag gac aaa acc att gaa gaa ttg gat gcc 
Leu Thr Asn Leu Ser He Gin Asp Lys Thr He Glu Glu Leu Asp Ala 
2065 2070 2075 2080 



6240 



gag aaa acg gtg ttg gaa aaa tec aaa gcg gga gca caa teg cgc ttt 
Glu Lys Thr Val Leu Glu Lys Ser Lys Ala Gly Ala Gin Ser Arg Phe 
2085 2090 2095 



6288 



gat age tac ggc aaa ctg tac gat gag aat ate aac gcc ggt gaa aac 
Asp Ser Tyr Gly Lys Leu Tyr Asp Glu Asn He Asn Ala Gly Glu Asn 
2100 2105 2110 



6336 



caa gee atg acg eta cga gcg tec gcc gcc ggg ett acc acg gca gtt 
Gin Ala Met Thr Leu Arg Ala Ser Ala Ala Gly Leu Thr Thr Ala Val 
2115 2120 2125 



6384 



cag gca tec cgt ctg gcc ggt gcg gcg get gat ctg gtg cet aac ate 
Gin Ala Ser Arg Leu Ala Gly Ala Ala Ala Asp Leu Val Pro Asn lie 
2130 2135 2140 



6432 



ttc ggc ttt gcc ggt ggc ggc age cgt tgg ggg get ate get gag gcg 
Phe Gly Phe Ala Gly Gly Gly Ser Arg Trp Gly Ala He Ala Glu Ala 
2145 2150 2155 2160 



6480 



aca ggt tat gtg atg gaa ttc tec gcg aat gtt atg aac acc gaa gcg 
Thr Gly Tyr Val Met Glu Phe Ser Ala Asn Val Met Asn Thr Glu Ala 
2165 2170 2175 



6528 



gat aaa att age caa tet gaa ace tac cgt cgt cgc cgt cag gag tgg 
Asp Lys He Ser Gin Ser Glu Thr Tyr Arg Arg Arg Arg Gin Glu Trp 
2180 2185 2190 



6576 



gag ate cag egg aat aat gcc gaa gcg gaa ttg aag caa ate gat get 
Glu He Gin Arg Asn Asn Ala Glu Ala Glu Leu Lys Gin He Asp Ala 
2195 2200 2205 



6624 



cag etc aaa tea etc get gta cgc cgc gaa gee gcc gta ttg cag aaa 
Gin Leu Lys Ser Leu Ala Val Arg Arg Glu Ala Ala Val Leu Gin Lys 
2210 2215 2220 



6672 



acc agt ctg aaa ace caa caa gaa cag acc caa tet caa ttg gee ttc 
Thr Ser Leu Lys Thr Gin Gin Glu Gin Thr Gin Ser Gin Leu Ala Phe 
2225 2230 2235 2240 



6720 



ctg caa cgt aag ttc age aat cag gcg tta tac aac tgg ctg cgt ggt 
Leu Gin Arg Lys Phe Ser Asn Gin Ala Leu Tyr Asn Trp Leu Arg Gly 
2245 2250 2255 



6768 



cga ctg gcg gcg att tac ttc cag ttc tac gat ttg gcc gtc gcg cgt 
Arg Leu Ala Ala He Tyr Phe Gin Phe Tyr Asp Leu Ala Val Ala Arg 
2260 2265 2270 



6816 



tgc ctg atg gca gaa caa get tac cgt tgg gaa etc aat gat gac tet 
Cys Leu Met Ala Glu Gin Ala Tyr Arg Trp Glu Leu Asn Asp Asp Ser 
2275 2280 2285 



6864 



gcc cgc ttc att aaa ccg ggc gcc tgg cag gga acc tat gcc ggt ctg 
Ala Arg Phe He Lys Pro Gly Ala Trp Gin Gly Thr Tyr Ala Gly Leu 
2290 2295 2300 



6912 
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gtc act ttg ccc gcg eta ctg gga ccg tat cag gat gta cag gca ata 
Val Thr Leu Pro Ala Leu Leu Gly Pro Tyr Gin Asp Val Gin Ala lie 
2420 2425 2430 



gca gtt tct cac ggt atg aat gac age ggc caa -ttc cag etc gat ttc 
Ala Val Ser His Gly Met Asn Asp Ser Gly Gin Phe Gin Leu Asp Phe 
2450 2455 2460 



7008 



7056 



7104 



ctt gca ggt gaa acc ttg atg ctg agt ctg gca caa atg gaa gac get 6960 
Leu Ala Gly Glu Thr Leu Met Leu Ser Leu Ala Gin Met Glu Asp Ala 
2305 2310 2315 2320 

cat ctg aaa cgc gat aaa cgc gca tta gag gtt gaa cgc aca gta teg 
His Leu Lys Arg Asp Lys Arg Ala Leu Glu Val Glu Arg Thr Val Ser 
2325 2330 2335 

ctg gee gaa gtt tat gca gga tta cca aaa gat aac ggt cca ttt tec 
Leu Ala Glu Val Tyr Ala Gly Leu Pro Lys Asp Asn Gly Pro Phe Ser 
2340 2345 2350 

ctg get cag gaa att gac aag ctg gtg agt caa ggt tea ggc agt gee 
Leu Ala Gin Glu lie Asp Lys Leu Val Ser Gin Gly Ser Gly Ser Ala 
2355 2360 2365 

ggc agt ggt aat aat aat ttg gcg ttc ggc gee ggc acg gac act aaa 7152 
Gly Ser Gly Asn Asn Asn Leu Ala Phe Gly Ala Gly Thr Asp Thr Lys 
2370 2375 2380 

ace tct ttg cag gca tea gtt tea ttc get gat ttg aaa att cgt gaa 7200 
Thr Ser Leu Gin Ala Ser Val Ser Phe Ala Asp Leu Lys lie Arg Glu 
2385 2390 2395 2400 

gat tac ccg gca teg ctt ggc aaa att cga cgt ate aaa cag ate age 7248 
Asp Tyr Pro Ala Ser Leu Gly Lys lie Arg Arg lie Lys Gin lie Ser 
2405 2410 2415 



7296 



ttg tct tac ggc gat aaa gee gga tta get aac ggc tgt gaa gcg ctg 7 34 4 
Leu Ser Tyr Gly Asp Lys Ala Gly Leu Ala Asn Gly Cys Glu Ala Leu 
2435 2440 2445 



7392 



aac gat ggc aaa ttc ctg cca ttc gaa ggc ate gee att gat caa ggc 7440 
Asn Asp Gly Lys Phe Leu Pro Phe Glu Gly lie Ala lie Asp Gin Gly 
2465 2470 2475 2480 

acg ctg aca ctg age ttc cca aat gca tct atg ccg gag aaa ggt aaa 7488 
Thr Leu Thr Leu Ser Phe Pro Asn Ala Ser Met Pro Glu Lys Gly Lys 
2485 2490 2495 

caa gee act atg tta aaa ace ctg aac gat ate att ttg cat att ege 7536 
Gin Ala Thr Met Leu Lys Thr Leu Asn Asp lie lie Leu His lie Arg 
2500 2505 2510 

tac acc att aaa taa 7551 
Tyr Thr lie Lys 
2515 



<210> 2 
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<211> 7515 
<212> DNA 

<213> Photorhabdus luminescens 

<220> 

<221> CDS 

<222> (1) . . (7512) 

<400> 2 

atg caa aac tea tta tea age act ate gat aet att tgt cag aaa etg 48 

Met Gin Asn Ser Leu Ser Ser Thr He Asp Thr He Cys Gin Lys Leu 
15 10 15 

caa tta act tgt ccg gcg gaa att get ttg tat cec ttt gat act ttc 96 
Gin Leu Thr Cys Pro Ala Glu He Ala Leu Tyr Pro Phe Asp Thr Phe 
20 25 30 



egg gaa aaa act egg gga atg gtt aat tgg ggg gaa gca aaa egg att 
Arg Glu Lys Thr Arg Gly Met Val Asn Trp Gly Glu Ala Lys Arg lie 
35 40 45 



cgt att ttt gcc tat get aat ccg etg ctg aaa aac get gtt egg ttg 
Arg He Phe Ala Tyr Ala Asn Pro Leu Leu Lys Asn Ala Val Arg Leu 
65 70 75 80 



144 



tat gaa att gca caa gcg gaa cag gat aga aac eta ett cat gaa aaa 192 
Tyr Glu He Ala Gin Ala Glu Gin Asp Arg Asn Leu Leu His Glu Lys 
50 55 60 



240 



ggt ace egg caa atg ttg ggt ttt ata caa ggt tat agt gat ctg ttt 288 
Gly Thr Arg Gin Met Leu Gly Phe He Gin Gly Tyr Ser Asp Leu Phe 
85 90 95 

ggt aat cgt get gat aac tat gcc gcg ccg ggc teg gtt gca teg atg 336 
Gly Asn Arg Ala Asp Asn Tyr Ala Ala Pro Gly Ser Val Ala Ser Met 
100 105 110 

ttc tea ccg gcg get t3t ttg acg gaa ttg tac cgt gaa gcc aaa aac 384 
Phe Ser Pro Ala Ala Tyr Leu Thr Glu Leu Tyr Arg Glu Ala Lys Asn 
115 120 125 

ttg cat gac age age tea att tat tac eta gat aaa cgt cgc ccg gat 4 32 
Leu His Asp Ser Ser Ser He Tyr Tyr Leu Asp Lys Arg Arg Pro Asp 
130 135 140 

tta gca age tta atg etc age cag aaa aat atg gat gag gaa att tea 480 
Leu Ala Ser Leu Met Leu Ser Gin Lys Asn Met Asp Glu Glu He Ser 
145 150 155 160 

acg ctg get etc- tct aat gaa ttg tgc ctt gcc ggg ate gaa aca aaa 528 
Thr Leu Ala Leu Ser Asn Glu Leu Cys Leu Ala Gly He Glu Thr Lys 
165 170 175 

aca gga aaa tea caa gat gaa gtg atg gat atg ttg tea act tat cgt 576 
Thr Gly Lys Ser Gin Asp Glu Val Met Asp Met Leu Ser Thr Tyr Arg 
180 185 190 

tta agt gga gag aca cct tat cat cae get tat gaa act gtt cgt gaa 624 
Leu Ser Gly Glu Thr Pro Tyr His His Ala Tyr Glu Thr Val Arg Glu 
195 200 205 
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ate gtt cat gaa cgt gat cca gga ttt cgt cat ttg tea cag gca ccc 672 
lie Val His Glu Arg Asp Pro Gly Phe Arg His Leu Ser Gin Ala Pro 
210 215 220 

att gtt get get aag etc gat cct gtg aet ttg ttg ggt att age tec 720 
lie Val Ala Ala Lys Leu Asp Pro. Val Thr Leu Leu Gly lie Ser Ser 
225 230 235 240 

cat att teg eea gaa ctg tat aac ttg ctg att gag gag ate eeg gaa 7 68 
His He Ser Pro Glu Leu Tyr Asn Leu Leu He Glu Glu He Pro Glu 
245 250 255 

aaa gat gaa gee geg ett gat aeg ctt tat aaa aca aac ttt ggc gat 816 
Lys Asp Glu Ala Ala Leu Asp Thr Leu Tyr Lys Thr Asn Phe Gly Asp 
260 265 270 

att act act get cag tta atg tec cca agt tat ctg gee egg tat tat 864 
He Thr Thr Ala Gin Leu Met Ser Pro Ser Tyr Leu Ala Arg Tyr Tyr 
275. 280 285 

ggc gtc tea eeg gaa gat att gee tae gtg aeg act tea tta tea cat 912 
Gly Val Ser Pro Glu Asp He Ala Tyr Val Thr Thr Ser Leu Ser His 
290 295 300 

gtt gga tat age agt gat att ctg gtt att eeg ttg gtc gat ggt gtg 960 
Val Gly Tyr Ser Ser Asp He Leu Val He Pro Leu Val Asp Gly Val 
305 310 315 320 

ggt aag atg gaa gta gtt cgt gtt acc cga aca cca teg gat aat tat 1008 
Gly Lys Met Glu Val Val Arg Val Thr Arg Thr Pro Ser Asp Asn Tyr 
325 330 335 

ace agt cag aeg aat tat att gag ctg tat cca cag ggt ggc gae aat 1056 
Thr Ser Gin Thr Asn Tyr He Glu Leu Tyr Pro Gin Gly Gly Asp Asn 
340 345 350 

tat ttg ate aaa tae aat eta age aat agt ttt ggt ttg gat gat ttt 1104 
Tyr Leu He Lys Tyr Asn Leu Ser Asn Ser Phe Gly Leu Asp Asp Phe 
355 360 365 

tat ctg caa tat aaa gat ggt tec get gat tgg act gag att gee eat 1152 
Tyr Leu Gin Tyr Lys Asp Gly Ser Ala Asp Trp Thr Glu He Ala His 
370 375 380 

aat ccc tat cct gat atg gtc ata aat caa aag tat gaa tea cag geg 1200 
Asn Pro Tyr Pro Asp Met Val He Asn Gin Lys Tyr Glu Ser Gin Ala 
385 390 395 400 

aca ate aaa cgt agt gae tet gac aat ata etc agt ata ggg tta caa 124 8 
Thr He Lys Arg Ser Asp Ser Asp Asn He Leu Ser He Gly Leu Gin 
405 410 415 

aga tgg eat age ggt agt tat aat ttt gee gee gee aat ttt aaa att 1296 
Arg Trp His Ser Gly Ser Tyr Asn Phe Ala Ala Ala Asn Phe Lys He 
420 425 430 

gae caa tac tee eeg aaa get ttc ctg ctt aaa atg aat aag get att 134 4 
Asp Gin Tyr Ser Pro Lys Ala Phe Leu Leu Lys Met Asn Lys Ala He 
435 440 445 

egg ttg etc aaa get acc ggc etc tet ttt get aeg ttg gag cgt att 1392 
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Arg Leu Leu 
450 

gtt gat agt 
Val Asp Ser 
465 

aag gtt tat 
Lys Val Tyr 



gag aca gcc 
Glu Thr Ala 



ggc aat cag 
Gly Asn Gin 
515 

aat ggt att 
Asn Gly lie 
530 

aat cct gat 
Asn Pro Asp 
545 

aag gcg gtt 
Lys Ala Val 



cag atg tta 
Gin Met Leu 



aac tta gag 
Asn Leu Glu 
595 

att cat aac 
lie His Asn 
610 

ggc tat ggc 
Gly Tyr Gly 
625 

aaa ata gtg 
Lys He Val 



aaa tgg aca 
Lys Trp Thr 



acc act tta 
Thr Thr Leu 

675 

act ttg cat 
Thr Leu His 



Lys Ala Thr 



gtt aat age 
Val Asn Ser 
470 

egg gta aaa 
Arg Val Lys 
485 

get att ttg 
Ala He Leu 

500 

ctt age cag 
Leu Ser Gin 



cge tat gaa 
Arg Tyr Glu 



ctg aac ctt 
Leu Asn Leu 
550 

tta aaa cgc 
Leu Lys Arg 
565 

ttg ate act 
Leu He Thr 
580 

aat ttg tet 
Asn Leu Ser 



ctg act att 
Leu Thr He 



gac acc aac 
Asp Thr Asn 
630 

gaa aca ttg 
Glu Thr Leu 
645 

gtt acc gac 
Val Thr Asp 
660 

acg cca gaa 
Thr Pro Glu 



ggc aaa gag 
Gly Lys Glu 



Gly Leu Ser 
455 

acc aaa tec 
Thr Lys Ser 



ttc tat att 
Phe Tyr He 



get aat att 
Ala Asn He 
505 

ttt gag caa 
Phe Glu Gin 
520 

ate agt gag 
He Ser Glu 
535 

aaa cca gac 
Lys Pro Asp 



gcg ttt cag 
Ala Phe Gin 



gat cgt aaa 
Asp Arg Lys 
585 

gat ctg tat 
Asp Leu Tyr 
600 

get gaa ttg 
Ala Glu Leu 
615 

att tat cag 
He Tyr Gin 



ttg tgg ate 
Leu Trp He 



ctg ttt ctg 
Leu Phe Leu 
665 

att age aat 
He Ser Asn 
680 

agt ctg att 
Ser Leu He 



Phe Ala Thr 
4 60 

ate acg gtt 
He Thr Val 
475 

gat cgt tat 
Asp Arg Tyr 
4 90 

aat ate tet 
Asn He Ser 



eta ttt aat 
Leu Phe Asn 



gac aac tec 
Asp Asn Ser 
540 

agt ace ggt 
Ser Thr Gly 
555 

gtt aac gee 
Val Asn Ala 
570 

gaa gac ggt 
Glu Asp Gly 



ttg gtt agt 
Leu Val Ser 



aac att ttg 
Asn He Leu 
620 

att acc gac 
He Thr Asp 
635 

act caa tgg 
Thr Gin Trp 
650 

atg ace acg 
Met Thr Thr 



ctg acg get 
Leu Thr Ala 



ggg gaa gat 
Gly Glu Asp 

14 



Leu Glu Arg 



gag gta tta 
Glu Val Leu 



ggc ate agt 
Gly He Ser 
495 

cag caa get 
Gin Gin Ala 
510 

eac ceg eeg 
His Pro Pro 
525 

aaa cat ctt 
Lys His Leu 



gat gat caa 
Asp Asp Gin 



agt gag ttg 
Ser Glu Leu 
575 

gtt ate aaa 
Val He Lys 
590 

ttg ctg gee 
Leu Leu Ala 
605 

ttg gtg att 
Leu Val He 



gat aat tta 
Asp Asn Leu 



ttg aag ace 
Leu Lys Thr 
655 

gee act tac 
Ala Thr Tyr 
670 

acg ttg tet 
Thr Leu Ser 
685 

ctg aaa aga 
Leu Lys Arg 



He 



aac 1440 

Asn 

480 

gaa 1488 
Glu 



gtt 1536 
Val 



etc 1584 
Leu 



cct 1632 
Pro 



cgc 1680 

Arg 
560 

tat 1728 
Tyr 



aat 1776 
Asn 



cag 1824 
Gin 



tgt 1872 
Cys 



gcc 1920 

Ala 

640 

caa 1968 
Gin 



age 2016 
Ser 



tea 2064 
Ser 



gea 2112 
Ala 
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690 695 700 

atg gcg cct tgc ttc act teg get ttg cat ttg act tct caa gaa gtt 

Met Ala Pro Cys Phe Thr Ser Ala Leu His Leu Thr Ser Gin Glu Val 

705 710 715 720 



2160 



gcg tat gac ctg ctg ttg tgg ata gac cag att caa ccg gca caa ata 2208 
Ala Tyr Asp Leu Leu Leu Trp He Asp Gin He Gin Pro Ala Gin He 
725 730 735 

act gtt gat ggg ttt tgg gaa gaa gtg caa aca aca cca acc age ttg 2256 
Thr Val Asp Gly Phe Trp Glu Glu Val Gin Thr Thr Pro Thr Ser Leu 
740 745 750 

aag gtg att acc ttt get cag gtg ctg gca caa ttg age ctg ate tat 
Lys Val He Thr Phe Ala Gin Val Leu Ala Gin Leu Ser Leu He Tyr 
755 760 765 

cgt cgt att ggg tta agt gaa acg gaa ctg tea ctg ate gtg act caa 
Arg Arg He Gly Leu Ser Glu Thr Glu Leu Ser Leu He Val Thr Gin 
770 775 780 

tct tct ctg eta gtg gca ggc aaa age ata ctg gat cac ggt ctg tta 
Ser Ser Leu Leu Val Ala Gly Lys Ser He Leu Asp His Gly Leu Leu 
785 790 795 800 

acc ctg atg gee ttg gaa ggt ttt cat acc tgg gtt aat ggc ttg ggg 2448 
Thr Leu Met Ala Leu Glu Gly Phe His Thr Trp Val Asn Gly Leu Gly 
805 810 815 



2304 



2352 



2400 



caa cat gcc tec ttg ata ttg gcg gcg ttg aaa gac gga gcc ttg aca 2496 
Gin His Ala Ser Leu He Leu Ala Ala Leu Lys Asp Gly Ala Leu Thr 
820 825 830 

gtt acc gat gta gca caa get atg aat aag gag gaa tct etc eta caa 
Val Thr Asp Val Ala Gin Ala Met Asn Lys Glu Glu Ser Leu Leu Gin 

835 840 845 

atg gca get aat cag gtg gag aag gat eta aca aaa ctg acc agt tgg 
Met Ala Ala Asn Gin Val Glu Lys Asp Leu Thr Lys Leu Thr Ser Trp 
850 855 860 

aca cag att gac get att ctg caa tgg tta cag atg tct teg gcc ttg 
Thr Gin He Asp Ala He Leu Gin Trp Leu Gin Met Ser Ser Ala Leu 
865 870 875 880 

gcg gtt tct cca ctg gat ctg gca ggg atg atg -gee ctg aaa tat ggg 
Ala Val Ser Pro Leu Asp Leu Ala Gly Met Met Ala Leu Lys Tyr Gly 
885 890 895 

ata gat eat aac tat get gcc tgg caa get gcg gcg get gcg ctg atg 
He Asp His Asn Tyr Ala Ala Trp Gin Ala Ala Ala Ala Ala Leu Met 
900 905 910 

get gat eat get aat cag gca cag aaa aaa ctg gat gag acg ttc agt 2784 
Ala Asp His Ala Asn Gin Ala Gin Lys Lys Leu Asp Glu Thr Phe Ser 
915 920 925 

aag gca tta tgt aac tat tat att aat get gtt. gte gat agt get get 2832 
Lys Ala Leu Cys Asn Tyr Tyr He Asn Ala Val Val Asp Ser Ala Ala 
930 935 940 
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gga gta cgt gat cgt aac ggt tta tat acc tat ttg ctg att gat aat 2880 
Gly Val Arg Asp Arg Asn Gly Leu Tyr Thr Tyr Leu Leu lie Asp Asn 
945 950 955 960 

cag gtt tct gcc gat gtg ate act tea cgt att gca gaa get ate gee 2928 
Gin Val Ser Ala Asp Val lie Thr Ser Arg He Ala Glu Ala He Ala 

965 970 975 

ggt att caa ctg tac gtt aac egg get tta aac cga gat gaa ggt cag 297 6 
Gly He Gin Leu Tyr Val Asn Arg Ala Leu Asn Arg Asp Glu Gly Gin 
980 985 990 

ctt gca teg gac gtt agt ace cgt cag ttc ttc act gac tgg gaa cgt 3024 
Leu Ala Ser Asp Val Ser Thr Arg Gin Phe Phe Thr Asp Trp Glu Arg 
995 1000 1005 

tac aat aaa cgt tac agt act tgg get ggt gtc tct gaa ctg gtc tat 3072 
Tyr Asn Lys Arg Tyr Ser Thr Trp Ala Gly Val Ser Glu Leu Val Tyr 
1010 1015 1020 

tat cea gaa aac tat gtt gat ccc act cag cgc att ggg caa acc aaa 3120 
Tyr Pro Glu Asn Tyr Val Asp Pro Thr Gin Arg He Gly Gin Thr Lys 
1025 1030 1035 1040 

atg atg gat gcg ctg ttg caa tec ate aac. cag age cag eta aat gcg 3168 
Met Met Asp Ala Leu Leu Gin Ser He Asn Gin Ser Gin Leu Asn Ala 
1045 1050 1055 

gat aeg gtg gaa gat get ttc aaa act tat ttg acc age ttt gag cag 3216 
Asp Thr Val Glu Asp Ala Phe Lys Thr Tyr Leu Thr Ser Phe Glu Gin 
1060 1065 1070 

gta gca aat ctg aaa gta att agt get tac eac gat aat gtg aat gtg 3264 
Val Ala Asn Leu Lys Val He Ser Ala Tyr His Asp Asn Val Asn Val 
1075 1080 1085 

gat caa gga tta act tat ttt ate ggt ate gac caa gca get ccg ggt 3312 
Asp Gin Gly Leu Thr Tyr Phe He Gly He Asp Gin Ala Ala Pro Gly 
1090 1095 1100 

aeg tat tac tgg cgt agt gtt gat eac age aaa tgt gaa aat gge aag 3360 
Thr Tyr Tyr Trp Arg Ser Val Asp His Ser Lys Cys Glu Asn Gly Lys 
1105 1110 1115 1120 

ttt gee get aat get tgg ggt gag tgg aat aaa att acc tgt get gtc 3408 
Phe Ala Ala Asn Ala Trp Gly Glu Trp Asn Lys He Thr Cys Ala Val 
1125 1130 1135 

aat ect tgg aaa aat ate ate cgt ccg gtt gtt tat atg tec cgc tta 3456 
Asn Pro Trp Lys Asn He He Arg Pro Val Val Tyr Met Ser Arg Leu 
1140 1145 ' 1150 

tat ctg eta tgg ctg gag cag caa tea aag aaa agt gat gat ggt aaa 3504 
Tyr Leu Leu Trp Leu Glu Gin Gin Ser Lys Lys Ser Asp Asp Gly Lys 
1155 1160 1165 

acc aeg att tat caa tat aac tta aaa ctg get eat att cgt tac gac 3552 
Thr Thr He Tyr Gin Tyr Asn Leu Lys Leu Ala His He Arg Tyr Asp 
1170 1175 1180 
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ggt agt tgg aat aca cca ttt act ttt gat gtg aca gaa aag gta aaa 
Gly Ser Trp Asn Thr Pro Phe Thr Phe Asp Val Thr Glu Lys Val Lys 
1185 1190 1195 1200 



3600 



aat tac acg teg agt act gat get get gaa tct tta ggg ttg tat tgt 3648 
Asn Tyr Thr Ser Ser Thr Asp Ala Ala Glu Ser Leu Gly Leu Tyr Cys 
1205 1210 1215 

act ggt tat caa ggg gaa gac act eta tta gtt atg tte tat teg atg 3696 
Thr Gly Tyr Gin Gly Glu Asp Thr Leu Leu Val Met Phe Tyr Ser Met 
1220 1225 1230 

cag agt agt tat age tec tat acc gat aat aat gcg ccg gtc act ggg 
Gin Ser Ser Tyr Ser Ser Tyr Thr Asp Asn Asn Ala Pro Val Thr Gly 
1235 1240 1245 

eta tat att ttc get gat atg tea tea gac aat atg acg aat gea caa 3792 
Leu Tyr lie Phe Ala Asp Met Ser Ser Asp Asn Met Thr Asn Ala Gin 
1250 1255 1260 



gca act aac tat tgg aat aac agt tat ccg caa ttt gat act gtg atg 
Ala Thr Asn Tyr Trp Asn Asn Ser Tyr Pro Gin Phe Asp Thr Val Met 
1265 1270 1275 1280 



aac cgt tat gcg gag gat tat gaa att ect tec tct gtg aca agt aac 

Asn Arg Tyr Ala Glu Asp Tyr Glu lie Pro Ser Ser Val Thr Ser Asn 
1300 1305 1310 

agt aat tat tct tgg ggt gat cac agt tta acc atg ett tat ggt ggt 

Ser Asn Tyr Ser Trp Gly Asp His Ser Leu Thr Met Leu Tyr Gly Gly 
1315 1320 1325 

agt gtt ect aat att act ttt gaa teg gcg gca gaa gat tta agg eta 

Ser Val Pro Asn lie Thr Phe Glu Ser Ala Ala Glu Asp Leu Arg Leu 
1330 1335 1340 



egc cgt ata caa tgt aat ett atg aaa caa tac get tea tta ggt gat 
Arg Arg lie Gin Cys Asn Leu Met Lys Gin Tyr Ala Ser Leu Gly Asp 
1365 1370 1375 



att tgt ata tat aat gaa aac act tec tct gaa gat aag aag tgg tat 
lie Cys lie Tyr Asn Giu Asn Pro Ser Ser Glu Asp Lys Lys Trp Tyr 
1410 1415 1420 



3744 



3840 



gca gat ccg gat age gac aat aaa aaa gtc ata acc aga aga gtt aat 3888 
Ala Asp Pro Asp Ser Asp Asn Lys Lys Val lie Thr Arg Arg Val Asn 
1285 1290 1295 



3936 



3984 



4032 



tct acc aat atg gea ttg agt att att eat aat gga tat gcg gga acc 4080 
Ser Thr Asn Met Ala Leu Ser lie lie His Asn Gly Tyr Ala Gly Thr 
1345 1350 1355 1360 



4128 



aaa ttt ata att tat gat tea tea ttt gat gat gca aac cgt ttt aat 4176 
Lys Phe lie lie Tyr Asp Ser Ser Phe Asp Asp Ala Asn Arg Phe Asn 
1380 1385 1390 

ctg gtg cca ttg ttt aaa ttc gga aaa gac gag aac tea gat gat agt 4224 
Leu Val Pro Leu Phe Lys Phe Gly Lys Asp Giu Asn Ser Asp Asp Ser 
1395 1400 1405 



4272 



ttt tct teg aaa gat gac aat aaa aca gcg gat tat aat ggt gga act 4 320 
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Phe Ser Ser Lys Asp Asp Asn Lys Thr Ala Asp Tyr Asn Gly Gly Thr 
1425 1430 1435 1440 

caa tgt ata gat get gga acc agt aac aaa gat ttt tat tat aat etc 4368 
Gin Cys lie Asp Ala Gly Thr Ser Asn Lys Asp Phe Tyr Tyr Asn Leu 
1445 1450 1455 

cag gag att gaa gta att agt gtt act ggt ggg tat tggtcg agt tat 4416 
Gin Glu lie Glu Val lie Ser Val Thr Gly Gly Tyr Trp Ser Ser Tyr 
1460 1465 1470 

aaa ata tec aac ccg att aat ate aat acg ggc att gat agt get aaa 4 4 64 
Lys lie Ser Asn Pro lie Asn lie Asn Thr Gly lie Asp Ser Ala Lys 
1475 1480 1485 

gta aaa gtc acc gta aaa gcg ggt ggt gac gat caa ate ttt act get 4512 
Val Lys Val Thr Val Lys Ala Gly Gly Asp Asp Gin lie Phe Thr Ala 
1490 1495 1500 

gat aat agt acc tat gtt cet cag caa ccg gca ece agt ttt gag gag 4560 
Asp Asn Ser Thr Tyr Val Pro Gin Gin Pro Ala Pro Ser Phe Glu Glu 
1505 1510 1515 1520 

atg att tat cag ttc aat aac ctg aca ata gat tgt aag aat tta aat 4608 
Met lie Tyr Gin Phe Asn Asn Leu Thr lie Asp Cys Lys Asn Leu Asn 
1525 1530 1535 

ttc ate gac aat cag gca eat att gag att gat ttc acc get acg gca 4 656 
Phe lie Asp Asn Gin Ala His lie Glu lie Asp Phe Thr Ala Thr Ala 
1540 1545 1550 

caa gat ggc ega ttc ttg ggt gca gaa act ttt att ate ccg gta act 4704 
Gin Asp Gly Arg Phe Leu Gly Ala Glu Thr Phe lie lie Pro Val Thr 
1555 1560 1565 

aaa aaa gtt etc ggt act gag aac gtg att gcg tta tat age gaa aat 4752 
Lys Lys Val Leu Gly Thr Glu Asn Val lie Ala Leu Tyr Ser Glu Asn 
1570 1575 1580 

aac ggt gtt caa tat atg caa att ggc gca tat cgt acc cgt ttg aat 4800 
Asn Gly Val Gin Tyr Met Gin lie Gly Ala Tyr Arg Thr Arg Leu Asn 
1585 1590 1595 1600 

acg tta ttc get caa cag ttg gtt age cgt get aat cgt ggc att gat 4848 
Thr Leu Phe Ala Gin Gin Leu Val Ser Arg Ala Asn Arg Gly lie Asp 
1605 1610 1615 

gca gtg etc agt atg gaa act cag aat att cag gaa ccg caa tta gga 4896 
Ala Val Leu Ser Met Glu Thr Gin Asn He Gin Glu Pro Gin Leu Gly 
1620 1625 1630 

gcg ggc aca tat gtg cag ctt gtg ttg gat aaa tat gat gag tct att 4 94 4 
Ala Gly Thr Tyr Val Gin Leu Val Leu Asp Lys Tyr Asp Glu Ser He 
1635 1640 1645 

cat ggc act aat aaa age ttt get att gaa tat gtt gat ata ttt aaa 4 992 
His Gly Thr Asn Lys Ser Phe Ala He Glu Tyr Val Asp lie Phe Lys 
1650 1655. _ 1660 

gag aac gat agt ttt gtg att tat caa gga gaa ctt age gaa aca agt 5040- 
Glu Asn Asp Ser Phe Val He Tyr Gin Gly Glu Leu Ser Glu Thr Ser 

18 



BNSDOCIO: <WO 01 1 1029A1 J_> 



wo 01/1 1029 



PCT/USOO/22237 



1665 1670 1675 1680 

caa act gtt gtg aaa gtt ttc tta tec tat ttt ata gag gcg act gga 5088 
Gin Thr Val Val Lys Val Phe Leu Ser Tyr Phe lie Glu Ala Thr Gly. 

1685 1690 1695 

aat aag aac cac tta tgg gta cgt get aaa tac caa aag gaa acg act 5136 
Asn Lys Asn His Leu Trp Val Arg Ala Lys Tyr Gin Lys Glu Thr Thr 
1700 1705 1710 

gat aag ate ttg ttc gac cgt act gat gag aaa gat ccg cac ggt tgg 5184 
Asp Lys lie Leu Phe Asp Arg Thr Asp Glu Lys Asp Pro His Gly Trp 
1715 1720 1725 

ttt etc age gac gat cac aag ace ttt agt ggt etc tet tec gca cag 
Phe Leu Ser Asp Asp His Lys Thr Phe Ser Gly Leu Ser Ser Ala Gin 
1730 1735 1740 

gca tta aag aac gac agt gaa ccg atg gat ttc tct ggc gee aat get 5280 
Ala Leu Lys Asn Asp Ser Glu Pro Met Asp Phe Ser Gly Ala Asn Ala 
1745 1750 1755 1760 



etc tat ttc tgg gaa ctg ttc tat tac acg ccg atg atg atg get cat 
Leu Tyr Phe Trp Glu Leu Phe Tyr Tyr Thr Pro Met Met Met Ala His 
1765 1770 1775 



1810 1815 1820 

caa caa ctg gac tec ace gat cea gat get gta gee caa gat gat ccg 

Gin Gin Leu Asp Ser Thr Asp Pro Asp Ala Vai Ala Gin Asp Asp Pro 

1825 1830 1835 1840 

atg cac tac aag gtg get ace ttt atg gcg acg ttg gat ctg eta atg 

Met His Tyr Lys Val Ala Thr Phe Met Ala Thr Leu Asp Leu Leu Met 

1845 1850 1855 



gaa get aaa atg tgg tat aca cag gcg ett aat ctg ttg ggt gat gag 
Glu Ala Lys Met Trp Tyr Thr Gin Ala Leu Asn Leu Leu Gly Asp Glu 
1875 1880 1885 



get get tea aaa ace aea cag cag gtt cgt cag caa gtg ctt. ace cag 

Ala Ala Ser Lys Thr Thr Gin Gin Val Arg Gin Gin Val Leu Thr Gin 

1905 1910 1915 1920 

19 



5232 



5328 



cgt ttg ttg cag gaa cag aat ttt gat gcg gcg aac cat tgg ttc cgt 537 6 
Arg Leu Leu Gin Glu Gin Asn Phe Asp Ala Ala Asn His Trp Phe Arg 
1780 1785 1790 



5424 



tat gtc tgg agt cca tec ggt tat ate gtt gat ggt aaa att get ate 

Tyr Val Trp Ser Pro Ser Gly Tyr lie Val Asp Gly Lys lie Ala lie 
1795 1800 1805 

tac cac tgg aac gtg cga ccg ctg gaa gaa gac ace agt tgg aat gca 5472 

Tyr His Trp Asn Val Arg Pro Leu Glu Glu Asp Thr Ser Trp Asn Ala 

irtif? TOor\ 



5520 



5568 



gee cgt ggt gat get get tac cgc cag tta gag cgt gat acg ttg get 5616 
Ala Arg Gly Asp Ala Ala Tyr Arg Gin Leu Glu Arg Asp Thr Leu Ala 
1860 1865 1870 



5664 



eca caa gtg atg ctg agt acg act tgg get aat cca aea ttg ggt aat 5712 
Pro Gin Val Met Leu Ser Thr Thr Trp Ala Asn Pro Thr Leu Gly Asn 
1890 1895 1900 



5760 
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ttg cgt etc aat age agg gta aaa acc ccg ttg eta gga aca gee aat 
Leu Arg Leu Asn Ser Arg Val Lys Thr Pro Leu Leu Gly Thr Ala Asn 
1925 1930 1935 



5808 



tec ctg acc get tta ttc ctg ccg cag gaa aat age aag etc aaa ggc 
Ser Leu Thr Ala Leu Phe Leu Pro Gin Giu Asn Ser Lys Leu Lys Gly 
1940 1945 1950 



5856 



tac tgg egg aca ctg gcg cag cgt atg ttt aat tta cgt cat aat ctg 
Tyr Trp Arg Thr Leu Ala Gin Arg Met Phe Asn Leu Arg His Asn Leu 
1955 1960 1965 



5904 



teg att gac ggc cag ccg etc tec ttg ccg ctg tat get aaa ccg get 
Ser lie Asp Gly Gin Pro Leu Ser Leu Pro Leu Tyr Ala Lys Pro Ala 
1970 1975 1980 



5952 



gat cea aaa get tta ctg agt gcg gcg gtt tea get tet caa ggg gga 
Asp Pro Lys Ala Leu Leu Ser Ala Ala Val Ser Ala Ser Gin Gly Gly 
1985 1990 1995 2000 



6000 



gee gac ttg ccg aag gcg ccg ctg act att cac cgc ttc ect caa atg 
Ala Asp Leu Pro Lys Ala Pro Leu Thr lie His Arg Phe Pro Gin Met 
2005 2010 2015 



6048 



eta gaa ggg gca egg ggc ttg gtt aac cag ctt ata cag ttc ggt agt 
Leu Glu Gly Ala Arg Gly Leu Val Asn Gin Leu lie Gin Phe Gly Ser 
2020 2025 2030 



6096 



tea eta ttg ggg tac agt gag cgt cag gat gcg gaa get atg agt caa 
Ser Leu Leu Gly Tyr Ser Glu Arg Gin Asp Ala Glu Ala Met Ser Gin 
2035 2040 2045 



6144 



eta ctg caa ace caa gee age gag tta ata ctg acc agt att cgt atg 6192 
Leu Leu Gin Thr Gin Ala Ser Glu Leu He Leu Thr Ser lie Arg Met 
2050 2055 2060 

cag gat aac 'caa ttg gca gag ctg gat teg gaa aaa acc gcc ttg caa 6240 
Gin Asp Asn Gin Leu Ala Glu Leu Asp Ser Glu Lys Thr Ala Leu Gin 
2065 2070 2075 2080 

gtc tet tta get gga gtg caa caa egg ttt gac age tat age caa ctg 6288 
Val Ser Leu Ala Gly Val Gin Gin Arg Phe Asp Ser Tyr Ser Gin Leu 
2085 2090 2095 

tat gag gag aac ate aac gca ggt gag cag ega gcg ctg gcg tta cgc 6336 
Tyr Glu Glu Asn He Asn Ala Gly Glu Gin Arg Ala Leu Ala Leu Arg 
2100 2105 2110 



tea gaa tet get att gag tet cag gga gcg cag att tec cgt atg gca 
Ser Glu Ser Ala He Glu Ser Gin Gly Ala Gin He Ser Arg Met Ala 
2115 2120 2125 



6384 



ggc gcg ggt gtt gat atg gca cea aat ate ttc ggc ctg get gat ggc 

Gly Ala Gly Val Asp Met Ala Pro Asn He Phe Gly Leu Ala Asp Gly 

2130 2135 2140 

ggc atg cat tat ggt get att gcc tat gcc ate get gac ggt att gag 

Gly Met His Tyr Gly Ala He Ala Tyr Ala He Ala Asp Gly He Glu 

2145 2150 2155 2160 



6432 



6480 
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ttg agt get tct gcc aag atg gtt gat gcg gag aaa gtt get cag teg 
Leu Ser Ala Ser Ala Lys Met Val Asp Ala Glu Lys Val Ala Gin Ser 
ffSr-^f!^ 2165 2170 2175 

gaa ata tat cgc cgt cgc cgt caa gaa tgg aaa att cag cgt gac aac 
Glu lie Tyr Arg Arg Arg Arg Gin Glu Trp Lys lie Gin Arg Asp Asn 
2180 2185 2190 

gca caa gcg gag att aac cag tta aac gcg caa ctg gaa tea ctg tct 
Ala Gin Ala Glu lie Asn Gin Leu Asn Ala Gin Leu Glu Ser Leu Ser 
2195 2200 2205 

att cgc cgt gaa gcc get gaa atg caa aaa gag tac ctg aaa acc cag 
lie Arg Arg Glu Ala Ala Glu Met Gin Lys Glu Tyr Leu Lys Thr Gin 
2210 2215 2220 

caa get cag gcg cag gca caa ctt act ttc tta aga age aaa ttc agt 
Gin Ala Gin Ala Gin Ala Gin Leu Thr Phe Leu Arg Ser Lys Phe Ser 
2225 2230 2235 2240 

aat caa gcg tta tat agt tgg tta cga ggg cgt ttg tea ggt att tat 
Asn Gin Ala Leu Tyr Ser Trp Leu Arg Gly Arg Leu Ser Gly lie Tyr 
2245 2250 2255 

ttc cag ttc tat gac ttg gcc gta tea cgt tgc ctg atg gca gag caa 
Phe Gin Phe Tyr Asp Leu Ala Val Ser Arg Cys Leu Met Ala Glu Gin 
2260 2265 2270 



tec tat caa tgg gaa get aat gat aat tec att age ttt gtc aaa ccg 
Ser Tyr Gin Trp Glu Ala Asn Asp Asn Ser lie Ser Phe Val Lys Pro 
2275 2280 2285 

ggt gca tgg caa gga act tac gcc ggc tta ttg tgt gga gaa get ttg 
Gly Ala Trp Gin Gly Thr Tyr Ala Gly Leu Leu Cys Gly Glu Ala Leu 
2290 2295 2300 

ata caa aat ctg gca caa atg gaa gag gca tat ctg aaa tgg gaa tct 
lie Gin Asn Leu Ala Gin Mei Glu Glu Ala Tyr Leu Lys Trp Glu Ser 
2305 2310 2315 2320 

cgc get ttg gaa gta gaa cgc aeg gtt tea ttg gca gtg gtt tat gat 
Arg Ala Leu Glu Val Glu Arg Thr Val Ser Leu Ala Val Val Tyr Asp 
2325 2330 2335 

tea ctg gaa ggt aat gat cgt ttt aat tta gcg gaa caa ata cet gca 
Ser Leu Glu Gly Asn Asp Arg Phe Asn Leu Ala Glu Gin lie Pro Ala 
2340 2345 2350 

tta ttg gat aag ggg gag gga aca gca gga act aaa gaa aat ggg tta 
Leu Leu Asp Lys Gly Glu Gly Thr Ala Gly Thr Lys Glu Asn Gly Leu 
2355 2360 2365 

tea ttg get aat get ate ctg tea get teg gtc aaa ttg tec gac ttg 
Ser Leu Ala Asn Ala lie Leu Ser Ala Ser Val Lys Leu Ser Asp Leu 
2370 2375 2380 

aaa ctg gga acg gat tat cca gac agt ate gtt ggt age aac aag gtt 
Lys Leu Gly Thr Asp Tyr Pro Asp Ser lie Val Gly Ser Asn Lys Val 
2385 2390 2395 2400 

cgt cgt att aag caa ate agt gtt teg eta cet gca ttg gtt ggg cet 
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6528 



6576 



6624 
6672 



6720 



6768 



6816 



6864 



6912 



6960 



7008 



7056 



7104 



7152 



7200 



7248 
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Arg Arg lie Lys Gin lie Ser Val Ser Leu Pro Ala Leu Val Gly Pro 
2405 2410 2415 

tat cag gat gtt cag get atg etc age tat ggt ggc agt act caa ttg 7296 

Tyr Gin Asp Val Gin Ala Met Leu Ser Tyr Gly Gly Ser Thr Gin Leu 
2420 2425 2430 

ccg aaa ggt tgt tea gcg ttg get gtg tct cat ggt ace aat gat agt 7344 

Pro Lys Gly Cys Ser Ala Leu Ala Val Ser His Gly Thr Asn Asp Ser 

2435 2440 2445 

ggt cag tte cag ttg gat ttc aat gae ggc aaa tac ctg cca ttt gaa 7392 

Gly Gin Phe Gin Leu Asp Phe Asn Asp Gly Lys Tyr Leu Pro Phe Glu 
2450 2455 2460 

ggt att get ctt gat gat cag ggt aca ctg aat ctt caa ttt ccg aat 74 40 

Gly lie Ala Leu Asp Asp Gin Gly Thr Leu Asn Leu Gin Phe Pro Asn 
2465 2470 2475 2480 

get acc gac aag cag aaa gea ata ttg caa act atg age gat att att 7488 

Ala Thr Asp Lys Gin Lys Ala lie Leu Gin Thr Met Ser Asp lie lie 
2485 2490 2495 

ttg cat att egt tat acc ate egt taa 7515 

Leu His lie Arg Tyr Thr lie Arg 
2500 



<210> 3 
<211> 7577 
<212> DNA 

<213> Artificial Sequence 

<220> 

<221> CDS 

<222> (3) . . (7553) 

<220> 

<223> Description of Artificial Sequence : hemicot tcdA 
<400> 3 

cc atg get aae gag tec gtc aag gag ate cca gac gte etc aag tec 47 
Met Ala Asn Glu Ser Val Lys Glu He Pro Asp Val Leu Lys Ser 
15 10 15 

caa tge ggt tte aae tgc etc act gae ate tec cae age tec ttc aae 95 
Gin Cys Gly Phe Asn Cys Leu Thr Asp He Ser His Ser Ser Phe Asn 
20 25 30 

gag ttc aga caa caa gte tct gag cae etc tec tgg tec gag ace eat 143 
Glu Phe Arg Gin Gin Val Ser Glu His Leu Ser Trp Ser Glu Thr His 
35 40 45 

gac etc tac eat gae get cag caa get cag aag gac aae agg etc tac 191 
Asp Leu Tyr His Asp Ala Gin Gin Ala Gin Lys Asp Asn Arg Leu Tyr 
50 55 60 

gag get agg ate etc aag agg get aae cca caa etc cag aae get gte 239 
Glu Ala Arg He Leu Lys Arg Ala Asn Pro Gin Leu Gin Asn Ala Val 
65 70 75 
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335 



383 



cac etc gcc ate ttg get cca aac get gag ttg att ggt tae aae aac 287 
His Leu Ala lie Leu Ala Pro Asn Ala Glu Leu He Gly Tyr Asn Asn 
80 85 90 95 

cag tte tet gge aga get age cag tae gtg get cct ggt aca gtc tee 
Gin Phe Ser Gly Arg Ala Ser Gin Tyr Val Ala Pro Gly Thr Val Ser 
100 105 110 

tee atg tte age eca gee get tae cte aet gag ttg tae ege gag get 
Ser Met Phe Ser Pro Ala Ala Tyr Leu Thr Glu Leu Tyr Arg Glu Ala 
115 . 120 125 

agg aae ctt cat get tct gac tec gtc tae tae ttg gae aea ege aga 431 
Arg Asn Leu His Ala Ser Asp Ser Val Tyr Tyr Leu Asp Thr Arg Arg 
130 135 140 

eea gac etc aag age atg gee etc age caa cag aac atg gae att gag 479 
Pro Asp Leu Lys Ser Met Ala Leu Ser Gin Gin Asn Met Asp ^Ile Glu 
145 150 155 

ttg tec ace etc tec ttg age aae gag ctt etc ttg gag tee ate aag 527 
Leu Ser Thr Leu Ser Leu Ser Asn Glu Leu Leu Leu Glu Ser He Lys 
160 165 170 175 

act gag age aag ttg gag aae tae ace aag gtc atg gag atg etc tec 575 
Thr Glu Ser Lvs Leu Glu Asn Tyr Thr Lys Val Met Glu Met Leu Ser 
180 185 190 

ace tte aga cca age ggt gea aet cca tae cat gat gee tae gag aae 
Thr Phe Arg Pro Ser Gly Ala Thr Pro Tyr His Asp Ala Tyr Glu Asn 
195 200 205 

gtc agg gag gtc ate caa ctt caa gac cct ggt ctt gag caa cte aac 
Val Arg Glu Val He Gin Leu Gin Asp Pro Gly Leu Glu Gin Leu Asn 
210 215 220 

get tct cca gee att get ggt ttg atg cac cag gea tee ttg etc ggt 
Ala Ser Pro Ala He Ala Gly Leu Met His Gin Ala Ser Leu Leu Gly 
225 230 235 

ate aae gee tec ate tct cct gag ttg tte aac ate ttg aet gag gag 
He Asn Ala Ser He Ser Pro Glu Leu Phe Asn He Leu Thr Glu Glu 
2'40 245 250 255 

ate act gag gge aac get gag gag ttg tae aag aag aae tte gge aac 
He Thr Glu Gly Asn Ala Glu Glu Leu Tyr Lys Lys Asn Phe Gly Asn 
260 265 270 

att gag eea gee tet ctt gea atg cct gag tae etc aag agg tac tae 
He Glu Pro Ala Ser Leu Ala Met Pro Glu Tyr Leu Lys Arg Tyr Tyr 
275 280 285 

aae ttg tet gat gag gag ctt tct caa tte att gge aag get tec aac 
Asn Leu Ser Asp Glu Glu Leu Ser Gin Phe He Gly Lys Ala Ser Asn 
290 295 300 

tte ggt caa cag gag tac age aac aac cag etc ate act eca gtt gtg 959 
Phe Gly Gin Gin Glu Tyr Ser Asn Asn Gin Leu He Thr Pro Val Val 
305 310 315 



623 



671 



719 



7 67 



815 



863 



911 
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aac tec tct gat ggc act gtg aag gtc tac cgc ate aea egt gag tac 1007 

Asn Ser Ser Asp Gly Thr Val Lys Val Tyr Arg lie Thr Arg Glu Tyr 
320 325 330 335 

ace aea aac gee tac caa atg gat gtt gag ttg tte cca ttc ggt ggt 1055 

Thr Thr Asn Ala Tyr Gin Met Asp Val Glu Leu Phe Pro Phe Gly Gly 
340 345 350 

gag aac tac aga ctt gae tac aag ttc aag aac ttc tac aac gee tec 1103 

Glu Asn Tyr Arg Leu Asp Tyr Lys Phe Lys Asn Phe Tyr Asn Ala Ser 
355 360 365 

tac etc tec ate aag ttg aac gac aag agg gag ctt gtc agg act gag 1151 

Tyr Leu Ser lie Lys Leu Asn Asp Lys Arg Glu Leu Val Arg Thr Glu 

370 375 380 



ggt get ect caa gtg aac att gag tac tct gee aac ate ace etc aac 
Gly Ala Pro Gin Val Asn lie Glu Tyr Ser Ala Asn He Thr Leu Asn 
385 390 ' 395 



1199 



aea get gac ate tct caa cca ttc gag att ggt ttg acc aga gtc ctt 1247 
Thr Ala Asp He Ser Gin Pro Phe Glu He Gly Leu Thr Arg Val Leu 
400 405 410 415 

cec tct ggc tec tgg gee tac get gca gee aag ttc act gtt gag gag 1295 
Pro Ser Gly Ser Trp Ala Tyr Ala Ala Ala Lys Phe Thr Val Glu Glu 
420 425 430 

tac aac cag tac tct ttc etc ttg aag etc aac aag gca att egt etc 1343 
Tyr Asn Gin Tyr Ser Phe Leu Leu Lys Leu Asn Lys Ala He Arg Leu 
435 440 445 

age aga gee act gag ttg tct cec acc ate ttg gag ggc att gtg agg 1391 
Ser Arg Ala Thr Glu Leu Ser Pro Thr He Leu Glu Gly He Val Arg 
450 455 460 

tct gtc aac ctt caa ctt gac ate aac act gat gtg ctt ggc aag gtc 1439 
Ser Val Asn Leu Gin Leu Asp He Asn Thr Asp Val Leu Gly Lys Val 
465 470 475 

ttc etc ace aag. tac tac atg caa cgc tac gcc ate cat get gag act 1487 
Phe Leu Thr Lys Tyr Tyr Met Gin Arg Tyr Ala He His Ala Glu Thr 
480 485 490 495 

gca etc ate etc tgc aac gca cec ate tct caa cgc tec tac gac aac 1535 
Ala Leu He Leu Cys Asn Ala Pro He Ser Gin Arg Ser Tyr Asp Asn 
500 505 510 

cag ect tec cag ttc gac agg etc ttc aac act ect etc ttg aac ggc 1583 
Gin Pro Ser Gin Phe Asp Arg Leu Phe Asn Thr Pro Leu Leu Asn Gly 
515 520 525 

cag tac tte tec act ggt gat gag gag att gae etc aac tct ggc tec 1631 
Gin Tyr Phe Ser Thr Gly Asp Glu Glu He Asp Leu Asn Ser Gly Ser 
530 535 540 

aca ggt gac tgg aga aag acc ate ttg aag agg gee ttc aac att gat 167 9 
Thr Gly Asp Trp Arg Lys Thr He Leu Lys Arg Ala Phe Asn He Asp 

545 550 555 

gat gtc tct etc ttc egt etc ttg aag ate aca gat cac gac aac aag 1727 
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Asp Val Ser Leu Phe Arg Leu Leu Lys lie Thr Asp His Asp Asn Lys 
560 565 570 575 

gat ggc aag ate aag aac aac ttg aag aac ctt tec aac etc tac att 1775 
Asp Gly Lys lie Lys Asn Asn Leu Lys Asn Leu Ser Asn Leu Tyr lie 
580 585 590 

ggc aag ttg ctt gca gac ate cac caa etc acc att gat gag ttg gac 1823 
Gly Lys Leu Leu Ala Asp lie His Gin Leu Thr lie Asp Glu Leu Asp 
595 600 605 

etc ttg etc att gca gtc ggt gag ggc aag acc aac etc tct gca ate 1871 
Leu Leu Leu lie Ala Val Gly Glu Gly Lys Thr Asn Leu Ser Ala lie 
610 615 620 

tct gac aag cag ttg gca acc etc ate agg aag ttg aac ace ate ace 1919 
Ser Asp Lys Gin Leu Ala Thr Leu lie Arg Lys Leu Asn Thr lie Thr 
625 630 635 

tec tgg ctt cac acc cag aag tgg tct gtc ttc eaa etc ttc ate atg 1967 
Ser Trp Leu His Thr Gin Lys Trp Ser Val Phe Gin Leu Phe lie Met 
640 645 650 655 

acc age acc tec tac aac aag acc etc act cct gag ate aag aac etc 2015 
Thr Ser Thr Ser Tyr Asn Lys Thr Leu Thr Pro Glu lie Lys Asn Leu 
660 665 670 

ttg gac aca gtc tac cac ggt etc caa ggc ttc gac aag gac aag get 2063 
Leu Asp Thr Val Tyr His Gly Leu Gin Gly Phe Asp Lys Asp Lys Ala 
675 680 685 

gac ttg ctt eat gtc atg get ecc tac att gca gee acc etc caa etc 2111 
Asp Leu Leu His Val Met Ala Pro Tyr lie Ala Ala Thr Leu Gin Leu 
690 695 700 

tec tct gag aac gtg get cac tct gtc ttg etc tgg get gac aag etc 2159 
Ser Ser Glu Asn Val Ala His Ser Val Leu Leu Trp Ala Asp Lys Leu 
705 710 715 

caa cct ggt gat ggt gcc atg act get gag aag ttc tgg gac tgg etc 2207 
Gin Pro Gly Asp Gly Ala Met Thr Ala Glu Lys Phe Trp Asp Trp Leu 

720 725 730 735 

aac acc aag tac aca cca ggc tec tct gag get gtt gag act caa gag 2255 
Asn Thr Lys Tyr Thr Pro Gly Ser Ser Glu Ala Val Glu Thr Gin Glu 
740 745 750 

cac att gtg caa tac tgc cag get ctt gca cag ttg gag atg gtc tac 2303 
His lie Val Gin Tyr Cys Gin Ala Leu Ala Gin Leu Glu Met Val Tyr 
755 760 765 

cac tec act 'ggc ate aac gag aac get ttc aga etc ttc gtc acc aag 2351 
His Ser Thr Gly lie Asn Glu Asn Ala Phe Arg Leu Phe Val Thr Lys 
770 775 780 

cct gag atg ttc ggt get gcc aca ggt get gca cct get cat gat get 2399 
Pro Glu Met Phe Gly Ala Ala Thr Gly Ala Ala Pro Ala His Asp Ala 
785 790 795 

etc tec etc ate atg ttg acc agg ttc get gac tgg gtc aac get ctt ' 2447 
Leu Ser Leu He Met Leu Thr Arg Phe Ala Asp Trp Val Asn Ala Leu 
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800 805 810 815 

ggt gag aag get tec tct gtc ttg get gee ttc gag gee aac tec etc 2495 

Gly Glu Lys Ala Ser Ser Val Leu Ala Ala Phe Glu Ala Asn Ser Leu 
820 825 830 

act get gag caa ett get gat gee atg aac ctt gat gee aac etc ttg 2543 

Thr Ala Glu Gin Leu Ala Asp Ala Met Asn Leu Asp Ala Asn Leu Leu 
835 840 845 

etc caa get tee att eaa get eag aae eac caa cae etc eca cct gtc 2591 

Leu Gin Ala Ser lie Gin Ala Gin Asn His Gin His Leu Pro Pro Val 
850 855 860 

act cca gag aae get ttc tec tgc tgg ace tec ate aac ace ate etc 2639 

Thr Pro Glu Asn Ala Phe Ser Cys Trp Thr Ser lie Asn Thr lie Leu 
865 870 875 

caa tgg gtc aac gtg get eag caa etc aac gtg get cca caa ggt gtc 2687 

Gin Trp Val Asn Val Ala Gin Gin Leu Asn Val Ala Pro Gin Gly Val 

880 885 890 895 

tct get ttg gtc ggt ctt gac tac ate eag tec atg aag gag aca cca 2735 
Ser Ala Leu Val Gly Leu Asp Tyr He Gin Ser Met Lys Glu Thr Pro 
900 905 910 



ace tac get caa tgg gag aae gea get ggt gtc ttg act get ggt etc 
Thr Tyr Ala Gin Trp Glu Asn Ala Ala Gly Val Leu Thr Ala Gly Leu 
915 920 925 



2783 



aac tec caa cag gee aac ace etc cat get ttc ttg gat gag tct cgc 2831 
Asn Ser Gin Gin Ala Asn Thr Leu His Ala Phe Leu Asp Glu Ser Arg 
930 935 940 

tct get gee etc tec ace tac tac ate agg caa gtc gee aag gea get 2879 
Ser Ala Ala Leu Ser Thr Tyr Tyr He Arg Gin Val Ala Lys Ala Ala 
945 950 955 

get gee ate aag tct cgc gat gac etc tac caa tac etc etc att gac 2927 
Ala Ala He Lys Ser Arg Asp Asp Leu Tyr Gin Tyr Leu Leu He Asp 
960 965 970 975 

aac cag gtc tct get gee ate aag acc ace agg ate get gag gee ate 2975 
Asn Gin Val Ser Ala Ala He Lys Thr Thr Arg He Ala Glu Ala He 
980 985 990 

get tec ate eaa etc tac gtc aac cgc get ett gag aae gtt gag gag 3023 
Ala Ser He Gin Leu Tyr Val Asn Arg Ala Leu Glu Asn Val Glu Glu 
995 1000 1005 

aac gee aac tct ggt gtc ate tct cgc caa ttc ttc ate gac tgg gac 3071 
Asn Ala Asn Ser Gly Val He Ser Arg Gin Phe Phe He Asp Trp Asp 
1010 1015 1020 

aag tac aac aag agg tac tec acc tgg get ggt gtc tct eaa ctt gtc 3119 
Lys Tyr Asn Lys Arg Tyr Ser Thr Trp Ala Gly Val Ser Gin Leu Val 
1025 1030 1035 

tac tac cca gag aac tac att gac cca ace atg agg att ggt eag ace 3167 
Tyr Tyr Pro Glu Asn Tyr He Asp Pro Thr Met Arg He Gly Gin Thr 
1040 1045 1050 1055 
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aag atg atg gat get etc ttg caa tct gtc tec caa age caa etc aac 3215 
Lys Met Met Asp Ala Leu Leu Gin Ser Val Ser Gin Ser Gin Leu Asn 
1060 1065 1070 

get gae act gtg gag gat gcc ttc atg age tac etc ace tec ttc gag 3263 
Ala Asp Thr Val Glu Asp Ala Phe Met Ser Tyr Leu Thr Ser Phe Glu 
1075 1080 1085 

caa gtt gcc aac etc aag gtc ate tct get tac cat gae aac ate aac 3311 
Gin Val Ala Asn Leu Lys Val lie Ser Ala Tyr His Asp Asn lie Asn 
1090 1095 1100 

aac gae caa ggt etc acc tac ttc att ggt etc tct gag act gat get 3359 
Asn Asp Gin Gly Leu Thr Tyr Phe He Gly Leu Ser Glu Thr Asp Ala 
1105 1110 1115 



ggt gag tac tac tgg aga tec gtg gae eac age aag ttc aac gat ggc 
Gly Glu Tyr Tyr Trp Arg. Ser Val Asp His Ser Lys Phe Asn Asp Gly 
1120 1125 1130 1135 



3407 



aag ttc get gea aac get tgg tct gag tgg eac aag att gae tgc cct 34 55 
Lys Phe Ala Ala Asn Ala Trp Ser Glu Trp His Lys He Asp Cys Pro 
1140 1145 1150 

ate aac cca tac aag tec ace ate aga cct gtc ate tac aag age ege 3503 
He Asn Pro Tyr Lys Ser Thr He Arg Pro Val He Tyr Lys Ser Arg 
1155 1160 1165 

etc tac ttg etc tgg ett gag eag aag gag ate ace aag caa act ggc 3551 
Leu Tyr Leu Leu Trp Leu Glu Gin Lys Glu He Thr Lys Gin Thr Gly 
1170 1175 1180 

aac tec aag gat ggt tac caa act gag act gae tac ege tac gag ttg 3599 
Asn Ser Lys Asp Gly Tyr Gin Thr Glu Thr Asp Tyr Arg Tyr Glu Leu 
1185 1190 1195 

aag ttg get cae ate ege tac gat ggt acc tgg aac act cca ate acc 3647 
Lys Leu Ala His He Arg Tyr Asp Gly Thr Trp Asn Thr Pro He Thr 
1200 1205 1210 1215 

ttc gat gtc aac aag aag ate age gag ttg aag ttg gag aag aac cgt 3695 
Phe Asp Val Asn Lys Lys He Ser Glu Leu Lys Leu Glu Lys Asn Arg 
1220 1225 1230 

get cct ggt etc tac tgc get ggt tac caa ggt gag gae acc etc ttg 37 4 3 
Ala Pro Gly Leu Tyr Cys Ala Gly Tyr Gin Gly Glu Asp Thr Leu Leu 
1235 1240 1245 

gtc atg ttc tac aac eag caa gae acc ett gae tec tac aag aac get 3791 
Val Met Phe Tyr Asn Gin Gin Asp Thr Leu Asp Ser Tyr Lys Asn Ala 
1250 1255 1260 

tec atg caa ggt etc tac ate ttc get gae atg get tee aag gae atg 3839 
Ser Met Gin Gly Leu Tyr He Phe Ala Asp Met Ala Ser Lys Asp Met 
1265 1270 1275 

act cca gag caa age aac gtc tac cgt gae aac tec tac caa eag ttc 3887 
Thr Pro Glu Gin Ser Asn Val Tyr Arg Asp Asn Ser Tyr Gin Gin Phe 
1280 1285 1290 1295 
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gac acc aac aac gtc agg cgt gtc aac aac aga tac get gag gac tac 3935 
Asp Thr Asn Asn Val Arg Arg Val Asn Asn Arg Tyr Ala Glu Asp Tyr 
1300 1305 1310 

gag ate cca age tct gtc age tct cge aag gac tac ggc tgg ggt gac 3983 
Glu lie Pro Ser Ser Val Ser Ser Arg Lys Asp Tyr Gly Trp Gly Asp 
1315 1320 1325 

tac tac etc age atg gtg tac aac ggt gac ate cca acc ate aac tac 4031 
Tyr Tyr Leu Ser Met Val Tyr Asn Gly Asp lie Pro Thr lie Asn Tyr 
1330 1335 1340 

aag get gcc tct tec gac etc aaa ate tac ate age cca aag etc agg 4079 
Lys Ala Ala Ser Ser Asp Leu Lys lie Tyr lie Ser Pro Lys Leu Arg 
1345 1350 1355 

ate ate cac aac ggc tac gag ggt cag aag agg aac cag tge aac ttg 4127 
lie lie His Asn Gly Tyr Glu Gly Gin Lys Arg Asn Gin Cys Asn Leu 
1360 1365 1370 1375 

atg aac aag tac ggc aag ttg ggt gac aag ttc att gtc tac acc tct 4175 
Met Asn Lys Tyr Gly Lys Leu Gly Asp Lys Phe lie Val Tyr Thr Ser 
1380 1385 1390 

ctt ggt gtc aac cca aac aac age tec aac aag etc atg ttc tac cca 4 2.23 
Leu Gly Val Asn Pro Asn Asn Ser Ser Asn Lys Leu Met Phe Tyr Pro 
1395 1400 1405 

gtc tac caa tac tct ggc aac ace tct ggt etc aac cag ggt aga etc 4 271 
Val Tyr Gin Tyr Ser Giy Asn Thr Ser Gly Leu Asn Gin Gly Arg Leu 
1410 1415 1420 

ttg ttc cac agg gac acc acc tac cca age aag gtg gag get tgg att 4319 
Leu Phe His Arg Asp Thr Thr Tyr Pro Ser Lys Val Glu Ala Trp lie 
1425 1430 1435 

. ect ggt gcc aag agg tec etc ace aac cag aac get gee att ggt gat 4 367 
Pro Gly Ala Lys Arg Ser Leu Thr Asn Gin Asn Ala Ala lie Gly Asp 
1440 1445 1450 1455 

gac tac gcc aca gac tec etc aac aag ect gat gac etc aag cag tac 4 415 
Asp Tyr Ala Thr Asp Ser Leu Asn Lys Pro Asp Asp Leu Lys Gin Tyr 
1460 1465 1470 

ate ttc atg act gac tec aag ggc aca gee act gat gtc tct ggt cca 4463 
lie Phe Met Thr Asp Ser Lys Gly Thr Ala Thr Asp Val Ser Gly Pro 
1475 1480 1485 

gtg gag ate aac act gca ate age cca gcc aag gtc caa ate att gtc 4511 
Val Glu He Asn Thr Ala He Ser Pro Ala Lys Val Gin He He Val 
1490 1495 1500 

aag get ggt ggc aag gag caa acc ttc aca get gac aag gat gtc tec 4559 
Lys Ala Gly Gly Lys Glu Gin Thr Phe Thr Ala Asp Lys Asp Val Ser 
1505 1510 1515 

ate cag cca age cca tec ttc gat gag atg aac tac caa ttc aac get 4 607 
He Gin Pro Ser Pro Ser Phe Asp Glu Met Asn Tyr Gin Phe Asn Ala 
1520 1525 1530 1535 

ctt gag att gat ggt tct ggc etc aac ttc ate aac aac tct get tec 4 655 
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Leu Glu lie Asp Gly Ser Gly Leu Asn Phe lie Asn Asn Ser Ala Ser 
1540 1545 1550 

att gat gtc acc ttc act gcc ttc get gag gat ggc cgc aag ttg ggt 4703 
lie Asp Val Thr Phe Thr Ala Phe Ala Glu Asp Gly Arg Lys Leu Gly 
1555 1560 1565 

tac gag age ttc tec ate cea gtc acc ctt aag gtt tec act gac aac 4751 
Tyr Glu Ser Phe Ser lie Pro Val Thr Leu Lys Vai Ser Thr Asp Asn 
1570 1575 1580 

gea etc acc ctt cat cac aac gag aac ggt get cag tac atg caa tgg 4799 
Ala Leu Thr Leu His His Asn Glu Asn Gly Ala Gin Tyr Met Gin Trp 
1585 1590 1595 



caa age tac cgc acc agg ttg aac acc etc ttc gca agg caa ctt gtg 
Gin Ser Tyr Arg Thr Arg Leu Asn Thr Leu Phe Ala Arg Gin Leu Val 
1600 1605 1610 1615 



4847 



gee egt gee acc aca ggc att gac acc ate etc age atg gag acc cag 4895 
Ala Arg Ala Thr Thr Gly lie Asp Thr lie Leu Ser Met Glu Thr Gin 
1620 1625 1630 

aac ate caa gag cca cag ttg ggc aag ggt ttc tac gcc acc ttc gtc 4943 
Asn lie Gin Glu Pro Gin Leu Gly Lys Gly Phe Tyr Ala Thr Phe Val 
1635 1640 1645 

ate cca cet tac aac etc age act cat ggt gat gag agg tgg ttc aag 4 991 
lie Pro Pro Tyr Asn Leu Ser Thr His Gly Asp Glu Arg Trp Phe Lys 
1650 1655 1660 

etc tac ate aag cac gtg gtt gac aac aac tec cac ate ate tac tet 5039 
Leu Tyr lie Lys His Val Val Asp Asn Asn Ser His He He Tyr Ser 
1665 1670 1675 

ggt caa etc act gac acc aac ate aac ate acc etc ttc ate cca ctt 5087 
Gly Gin Leu Thr Asp Thr Asn He Asn He Thr Leu Phe He Pro Leu 
1680 1685 1690 1695 

gac gat gtc cca etc aac cag gac tac cat gcc aag gtc tac atg acc 5135 
Asp Asp Val Pro Leu Asn Gin Asp Tyr His Ala Lys Val Tyr Met Thr 
1700 1705 1710 



ttc aag aag tet cca tct gat ggc acc tgg tgg ggt cca cac ttc gtc 5183 

Phe Lys Lys Ser Pro Ser Asp Gly Thr Trp Trp Gly Pro His Phe Val 
1715 1720 1725 

egt gat gac aag ggc ate gtc acc ate aac cea aag tec ate etc ace 5231 

Arg Asp Asp Lys Gly He Val Thr He Asn Pro Lys Ser He Leu Thr 
1730 1735 1740 

cac ttc gag tet gtc aac gtt etc aac aac ate tec tct gag cca atg 5279 

His Phe Glu Ser Val Asn Val Leu Asn Asn He Ser Ser Glu Pro Met 
1745 1750 1755 



gac ttc tct ggt gcc aac tec etc tac ttc tgg gag ttg ttc tac tac 5327 
Asp Phe Ser Gly Ala Asn Ser Leu Tyr Phe Trp Glu Leu Phe Tyr Tyr 
1760 1765 1770 1775 



aca cca atg ctt gtg get caa agg ttg etc cat gag cag aac ttc gat 5375 
Thr Pro Met Leu Val Ala Gin Arg Leu Leu His Glu Gin Asn Phe Asp 
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1780 1785 1790 

gag gcc aac agg tgg etc aag tac gtc tgg age cca tct ggt tac att 5423 
Glu Ala Asn Arg Trp Leu hys Tyr Val Trp Ser Pro Ser Gly Tyr lie 
1795 1800 1805 

gtg cat ggt caa ate cag aac tac caa tgg aac gtc agg cca ttg ctt 5471 
Val His Gly Gin lie Gin Asn Tyr Gin Trp Asn Val Arg Pro Leu Leu 
1810 1815 1820 

gag gac acc tec tgg aac tct gac cca ctt gae tct gtg gae cct gat 5519 
Glu Asp Thr Ser Trp Asn Ser Asp Pro Leu Asp Ser Val Asp Pro Asp 
1825 1830 1835 

get gtg get caa cat gac cca atg eac tac aag gtc tec acc ttc atg 5567 
Ala Val Ala Gin His Asp Pro Met His Tyr Lys Val Ser Thr Phe Met 
1840 1845 1850 1855 

agg acc ttg gac etc ttg att gcc aga ggt gac cat get tac cge caa 5615 
Arg Thr Leu Asp Leu Leu lie Ala Arg Gly Asp His Ala Tyr Arg Gin 
1860 1865 1870 

ttg gag agg gac acc etc aac gag gca aag atg tgg tac atg caa get 5663 
Leu Glu Arg Asp Thr Leu Asn Glu Ala Lys Met Trp Tyr Met Gin Ala 
1875 1880 1885 

etc eac etc ttg ggt gae aag cca tac etc cca etc age acc act tgg 5711 
Leu His Leu Leu Gly Asp Lys Pro Tyr Leu Pro Leu Ser Thr Thr Trp 
1890 1895 1900 

tec gac cca agg ttg gac cgt get get gac ate acc act cag aac get 5759 
Ser Asp Pro Arg Leu Asp Arg Ala Ala Asp lie Thr Thr Gin Asn Ala 
1905 1910 1915 

cat gac tct gcc att gtt get etc agg cag aac ate cca act cct get 5807 
His Asp Ser Ala lie Val Ala Leu Arg Gin Asn lie Pro Thr Pro Ala 
1920 1925 1930 1935 

cca etc tec etc aga tct get aac acc etc act gac ttg ttc etc cca 5855 
Pro Leu Ser Leu Arg Ser Ala Asn Thr Leu Thr Asp Leu Phe Leu Pro 
1940 1945 1950 

cag ate aac gag gtc atg atg aac tac tgg caa acc ttg get caa agg 5903 
Gin lie Asn Glu Val Met Met Asn Tyr Trp Gin Thr Leu Ala Gin Arg 
1955 1960 1965 

gtc tac aac etc aga cae aac etc tec att gat ggt caa cca etc tac 5951 
Val Tyr Asn Leu Arg His Asn Leu Ser lie Asp Gly Gin Pro Leu Tyr 
1970 1975 1980 

etc cca ate tac gee aca cca get gae cca aag get ctt etc tct get 5999 
Leu Pro lie Tyr Ala Thr Pro Ala Asp Pro Lys Ala Leu Leu Ser Ala 
1985 1990 1995 

get gtg get acc age caa ggt ggt ggc aag etc cca gag tec ttc atg 6047 
Ala Val Ala Thr Ser Gin Gly Gly Gly Lys Leu Pro Glu Ser Phe Met 
2000 2005 2010 2015 

tec etc tgg agg ttc cca eac atg ttg gag aac gee cgt ggc atg gtc 6095 
Ser Leu Trp Arg Phe Pro His Met Leu Glu Asn Ala Arg Gly Met Val 
2020 2025 2030 
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tec caa etc acc cag ttc ggt tec acc etc cag aac ate att gag agg 
Ser Gin Leu Thr Gin Phe Gly Ser Thr Leu Gin Asn lie lie Glu Arg 
2035 2040 2045 

caa gat get gag get etc aac get ttg etc cag aac cag gca get gag 
Gin Asp Ala Glu Ala Leu Asn Ala Leu Leu Gin Asn Gin Ala Ala Glu 
2050 2055 2060 



6143 



6191 



ttg ate etc ace aac ttg tec ate caa gae aag acc att gag gag ctt 
Leu lie Leu Thr Asn Leu Ser lie Gin Asp Lys Thr lie Glu Glu Leu 
2065 2070 2075 

gat get gag aag aca gtc ctt gag aag age aag get ggt gee caa tet 
Asp Ala Glu Lys Thr Val Leu Glu Lys Ser Lys Ala Gly Ala Gin Ser 
2080 2085 2090 2095 

egc ttc gae tec tac ggc aag etc tac gat gag aac ate aac get ggt 
Arg Phe Asp Ser Tyr Gly Lys -Leu Tyr Asp Glu Asn lie Asn Ala Gly 
2100 2105 2110 



6239 



6287 



6335 



gag aac cag gee atg acc etc agg get tec gca get ggt etc acc act 
Glu Asn Gin Ala Met Thr Leu Arg Ala Ser Ala Ala Gly Leu Thr Thr 
2115 2120 2125 

get gtc caa gee tct egc ttg get ggt gca get get gae etc gtt eca 
Ala Val Gin Ala Ser Arg Leu Ala Gly Ala Ala Ala Asp Leu Val Pro 
2130 2135 2140 



6383 



6431 



aac ate ttc ggt ttc get ggt ggt ggc tec aga tgg ggt gee att get 
Asn lie Phe Gly Phe Ala Gly Gly Gly Ser Arg Trp Gly Ala lie Ala 
2145 2150 2155 



6479 



gag get ace ggt tac gtc atg gag ttc tct gee aac gtc atg aac act 
Glu Ala Thr Gly Tyr Val Met Glu Phe Ser Ala Asn Val Met Asn Thr 
2160 2165 2170 2175 

gag get gae aag ate age caa tct gag acc tac aga agg egc cgt caa 
Glu Ala Asp Lys lie Ser Gin Ser Glu Thr Tyr Arg Arg Arg Arg Gin 
2180 2185 2190 

gag tgg gag ate caa agg aac aac get gag gca gag ttg aag caa ate 
Glu Trp Glu lie Gin Arg Asn Asn Ala Glu Ala Glu Leu Lys Gin lie 
2195 2200 2205 



6527 



6575 



6623 



gat get caa etc aag tee ttg get gtc aga agg gag get get gtc etc 
Asp Ala Gin Leu Lys Ser Leu Ala Val Arg Arg Glu Ala Ala Val Leu 
2210 2215 2220 



6671 



cag aag acc tec etc aag ace caa cag gag caa acc cag tec cag ttg 
Gin Lys Thr Ser Leu Lys Thr Gin Gin Glu Gin Thr Gin Ser Gin Leu 
2225 2230- 2235 



6719 



get ttc etc caa agg aag ttc tec aac cag get etc tac aac tgg etc 
Ala Phe Leu Gin Arg Lys Phe Ser Asn Gin Ala Leu Tyr Asn Trp Leu 
2240 2245 2250 2255 



6767 



aga ggc egc ttg get gee ate tac ttc caa ttc tac gae ctt get gtg 
Arg Gly Arg Leu Ala Ala lie Tyr Phe Gin Phe Tyr Asp Leu Ala Val 
2260 2265 2270 



6815 
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gcc agg tgc etc atg get gag caa gcc tac cgc tgg gag ttg aac gat 
Ala Arg Cys Leu Met Ala Glu Gin Ala Tyr Arg Trp Glu Leu Asn Asp 
2275 2280 2285 



6863 



gae tec gcc agg ttc ate aag cca ggt get tgg eaa ggc ace tac get 6911 
Asp Ser Ala Arg Phe He Lys Pro Gly Ala Trp Gin Gly Thr Tyr Ala 
2290 2295 2300 

ggt etc ctt get ggt gag ace etc atg etc tee ttg get eaa atg gag 6959 
Gly Leu Leu Ala Gly Glu Thr Leu Met Leu Ser Leu Ala Gin Met Glu 
2305 2310 2315 

gat get eac etc aag agg gac aag agg get ttg gag gtg gag agg aca 7007 
Asp Ala His Leu Lys Arg Asp Lys Arg Ala Leu Glu Val Glu Arg Thr 
2320 2325 2330 2335 

gtc tec ctt get gag gtc tac get ggt etc cca aag gac aac ggt cca 7055 
Val Ser Leu Ala Glu Val Tyr Ala Gly Leu Pro Lys Asp Asn Gly Pro 
^ 2340 2345 2350 

ttc tee ctt get eaa gag att gae aag ttg gtc age eaa ggt tet ggt 7103 
Phe Ser Leu Ala Gin Glu He Asp Lys Leu Val Ser Gin Gly Ser Gly 
2355 2360 2365 

tet get ggt tet ggt aac aac aac ttg get ttc ggc get ggt act gac 7151 
Ser Ala Gly Ser Gly Asn Asn Asn Leu Ala Phe Gly Ala Gly Thr Asp 
2370 2375 2380 

acc aag acc tec etc caa gcc tet gtc tec ttc get gac etc aag ate 7199 
Thr Lys Thr Ser Leu Gin Ala Ser Val Ser Phe Ala Asp Leu Lys He 
2385 2390 2395 

agg gag gac tac cca get tee ctt ggc aag ate agg cgc ate aag eaa 7247 
Arg Glu Asp Tyr Pro Ala Ser Leu Gly Lys He Arg Arg He Lys Gin 
2400 2405 2410 2415 

ate tet gtc ace etc cca get etc ttg ggt cca tac eaa gat gtc eaa 7295 
He Ser Val Thr Leu Pro Ala Leu Leu Gly Pro Tyr Gin Asp Val Gin 
2420 2425 2430 

gca ate etc tec tac ggt gac aag get ggt ttg geg aac ggt tgc gag 7343 
Ala lie Leu Ser Tyr Gly Asp Lys Ala Gly Leu Ala Asn Gly Cys Glu 
2435 2440 2445 

get ctt get gtc tet eat ggc atg aac gac tet ggt caa ttc caa ctt 7391 
Ala Leu Ala Val Ser His Gly Met Asn Asp Ser Gly Gin Phe Gin Leu 
2450 2455 2460 



gac ttc aac gat ggc aag ttc etc cca ttc gag ggc att gee att gac 
Asp Phe Asn Asp Gly Lys Phe Leu Pro Phe Glu Gly He Ala He Asp 
2465 2470 2475 



7439 



eaa ggc acc etc ace etc tec ttc cca aac get tee atg cca gag aag 7487 
Gin Gly Thr Leu Thr Leu Ser Phe Pro Asn Ala Ser Met Pro Glu Lys 
2480 2485 2490 2495 

gga aag caa gcc acc atg etc aag acc etc aac gat ate ate etc eac 7535 
Gly Lys Gin Ala Thr Met Leu Lys Thr Leu Asn Asp He He Leu His 
2500 2505 2510 

ate agg tac acc ate aag tgagctcgag aggcetgegg cegc 7577 
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He Arg Tyr Thr He Lys 
2515 



<210> 4 
<211> 7541 
<212> DNA 

<213> Artificial Sequence 

<220> 

<221> CDS 

<222> (3).. (7517) 

<220> 

<223> Description of Artificial Sequence : hemicot tcbA 
<400> 4 

cc atg get cag aac tec etc age tec acc att gac ace ate tge eag 47 
Met Ala Gin Asn Ser Leu Ser Ser Thr He Asp Thr He Cys Gin 
1 5 10 15 

aag ctt caa etc acc tge cea get gag ate gcc etc tac cea ttc gac 95 
Lys Leu Gin Leu Thr Cys Pro Ala Glu He Ala Leu Tyr Pro Phe Asp 
20 25 30 

acc ttc cgt gag aag acc aga ggc atg gtc aac tgg ggt gag gcc aag 143 
Thr Phe Arg Glu Lys Thr Arg Gly Met Val Asn Trp Gly Glu Ala Lys 
35 40 45 

agg ate tac gag att get caa get gag caa gac agg aac etc ctt cat 191 
Arg He Tyr Glu He Ala Gin Ala Glu Gin Asp Arg Asn Leu Leu His 
50 55 60 

gag aag agg ate ttc gee tac get aac cea ttg etc aag aac get gtc 239 
Glu Lys Arg He Phe Ala Tyr Ala Asn Pro Leu Leu Lys Asn Ala Val 
65 70 75 

agg ctt ggt acc agg caa atg ttg ggt ttc ate caa ggt tac tet gac 287 
Arg Leu Gly Thr Arg Gin Met Leu Gly Phe He Gin Gly Tyr Ser Asp 
80 85 90 95 

ttg ttc ggc aac agg get gac aac tac gea get cet ggt tet gtt get 335 
Leu Phe Gly Asn Arg Ala Asp Asn Tyr Ala Ala Pro Gly Ser Val Ala 
100 105 110 

age atg ttc age cea get gee tac etc act gag ttg tac cgt gag gee 383 
Ser Met Phe Ser Pro Ala Ala Tyr Leu Thr Glu Leu Tyr Arg Glu Ala 
115 120 125 

aag aac etc cat gac age tec age ate tac tac ctt gac aag agg cgc 431 
Lys Asn Leu His Asp Ser Ser Ser He Tyr Tyr Leu Asp Lys Arg Arg 
130 135 140 



cea gac ctt get tec ttg atg etc tec cag aag aac atg gat gag gag 
Pro Asp Leu Ala Ser Leu Met Leu Ser Gin Lys Asn Met Asp Glu Glu 
145 150 155 



479 



ate age acc ttg get etc tec aac gag ctt tge ttg get gge att gag 527 
He Ser Thr Leu Ala Leu Ser Asn Glu Leu Cys Leu Ala Gly He Glu 
160 165 170 175 
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acc aag act ggc aag tec caa gat gag gtc atg gac atg etc tec ace 575 
Thr Lys Thr Gly Lys Ser Gin Asp Glu Val Met Asp Met Leu Ser Thr 
180 185 190 

tac cgc etc tct ggt gag act cca tac cac cat get tac gag act gtc 623 
Tyr Arg Leu Ser Gly Glu Thr Pro Tyr His His Ala Tyr Glu Thr Val 
195 200 205 

agg gag att gtc cat gag agg gac cca ggt ttc cgc cac etc tec caa 671 
Arg Glu He Val His Glu Arg Asp Pro Gly Phe Arg His Leu Ser Gin 
210 215 220 

get cec att gtg get gee aag ttg gac cca gtc ace etc ttg ggc ate 719 
Ala Pro He Val Ala Ala Lys Leu Asp Pro Val Thr Leu Leu Gly He 
225 230 235 

tec age cac ate age cca gag ttg tac aac ett etc att gag gag ate 767 
Ser Ser His He Ser Pro Glu Leu Tyr Asn Leu Leu He Glu Glu He 
240 245 250 255 

cca gag aag gat gag gca get ttg gac acc etc tac aag acc aac ttc 815 
Pro Glu Lys Asp Glu Ala Ala Leu Asp Thr Leu Tyr Lys Thr Asn Phe 
260 265 270 

ggt gac ate acc act get caa etc atg age cca tec tac ttg gee agg 863 
Gly Asp He Thr Thr Ala Gin Leu Met Ser Pro Ser Tyr Leu Ala Arg 
275 280 285 

tac tac ggt gtc tct cca gag gac att get tac gtc ace aca age etc 911 
Tyr Tyr Gly Val Ser Pro Glu Asp He Ala Tyr Val Thr Thr Ser Leu 
290 295 300 

tec cat gtg ggt tac tec tct gac ate ctt gtc ate cca etc gtg gat 959 
Ser His Val Gly Tyr Ser Ser Asp He Leu Val He Pro Leu Val Asp 
305 310 315 

ggt gtg ggc aag atg gag gtt gtc agg gtc acc agg act cca tct gac 1007 
Gly Val Gly Lys Met Glu Val Val Arg Val Thr Arg Thr Pro Ser Asp 
320 325 330 335 

aac tac acc tec cag acc aac tac att gag ttg tac cca caa ggt ggt 1055 
Asn Tyr Thr Ser Gin Thr Asn Tyr He Glu Leu Tyr Pro Gin Gly Gly 
340 345 350 



gac aac tac etc ate aag tac aac etc tee aac tct ttc ggt ttg gat 
Asp Asn Tyr Leu He Lys Tyr Asn Leu Ser Asn Ser Phe Gly Leu Asp 

355 360 365 



get cac aac cca tac cca gac atg gtc ate aac cag aag tac gag tec 
Ala His Asn Pro Tyr Pro Asp Met Val He Asn Gin Lys Tyr Glu Ser 
385 390 395 



1103 



gac ttc tac etc cag tac aag gat ggt tct get gac tgg act gag att 1151 
Asp Phe Tyr Leu Gin Tyr Lys Asp Gly Ser Ala Asp Trp Thr Glu He 
370 375 380 



1199 



caa gee ace ate aag aga tct gac tct gac aac ate etc tee att ggt 124 7 

Gin Ala Thr He Lys Arg Ser Asp Ser Asp Asn He Leu Ser He Gly 
400 405 410. 415 

etc caa agg tgg cac tct ggt tec tac aac ttc get get gee aac ttc 1295 
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Leu Gin Arg Trp His Ser Gly Ser Tyr Asn Phe Ala Ala Ala Asn Phe 
420 425 430 

aag att gac caa tac tct cca aag get ttc etc ttg aag atg aac aag 

Lys lie Asp Gin Tyr Ser Pro Lys Ala Phe Leu Leu Lys Met Asn Lys 

435 440 445 



etc cca aac cca gac etc aac etc aag cca gac tec act ggt gat gac 
Leu Pro Asn Pro Asp Leu Asn Leu Lys Pro Asp Ser Thr Gly Asp Asp 
545 550 555 



gee caa ate cac aac ttg acc att get gag ttg aac ate etc ttg gtc 

Ala Gin lie His Asn Leu Thr lie Ala Glu Leu Asn lie Leu Leu Val 

610 615 620 

ate tgc ggt tac ggt gac acc aac ate tac caa ate act gac gac aac 

lie Cys Gly Tyr Gly Asp Thr Asn He Tyr Gin He Thr Asp Asp Asn 
625 630 635 

ett gee aag att gtg gag acc etc ttg tgg ate acc caa tgg etc aag 

Leu Ala Lys He Val Glu Thr Leu Leu Trp He Thr Gin Trp Leu Lys 

640 645 650 655 

acc cag aag tgg act gtc aca gac etc ttc etc atg acc act gee acc 

Thr Gin Lys Trp Thr Val Thr Asp Leu Phe Leu Met Thr Thr Ala Thr 



1343 



gee ate agg etc ttg aag gee act ggt etc tec ttc gee ace ett gag 1391 
Ala He Arg Leu Leu Lys Ala Thr Gly Leu Ser Phe Ala Thr Leu Glu 
450 455 460 

agg att gtg gac tct gtc aac tec ace aag tee ate act gtg gag gtc 1439 
Arg He Val Asp Ser Val Asn Ser Thr Lys Ser He Thr Val Glu Val 
465 470 475 

etc aac aag gtc tac aga gtc aag ttc tac att gac cgc tac ggc ate 1487 
Leu Asn Lys Val Tyr Arg Val Lys Phe Tyr He Asp Arg Tyr Gly He 
480 485 490 495 

tct gag gag act get gee ate ett gee aac ate aac ate tee cag caa 1535 
Ser Glu Glu Thr Ala Ala He Leu Ala Asn He Asn He Ser Gin Gin 
500 505 510 

get gtc ggc aac cag etc tee caa ttc gag caa etc ttc aac cac cet 1583 
Ala Val Gly Asn Gin Leu Ser Gin Phe Glu Gin Leu Phe Asn His Pro 
515 520 525 

cca etc aac ggc ate cgc tac gag ate age gag gac aac tec aag cac 1631 
Pro Leu Asn Gly He Arg Tyr Glu He Ser Glu Asp Asn Ser Lys His 
530 535 540 



1679 



caa agg aag get gtc etc aag agg get ttc caa gtc aac get tct gag 1727 
Gin Arg Lys Ala Val Leu Lys Arg Ala Phe Gin Val Asn Ala Ser Glu 
560 565 570 575 

ett tac caa atg etc ttg ate act gac agg aag gag gat ggt gtc ate 1775 
Leu Tyr Gin Met Leu Leu He Thr Asp Arg Lys Glu Asp Gly Val He . 

580 585 590 

aag aac aac ttg gag aac etc tct gac etc tac ett gtc tec etc ttg 1823 
Lys Asn Asn Leu Glu Asn Leu Ser Asp Leu Tyr Leu Val Ser Leu Leu 
595 600 605 



1871 



1919 



1967 



2015 
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tac tec acc 
Tyr Ser Thr 



age tec acc 
Ser Ser Thr 
690 

agg gca atg 
Arg Ala Met 
705 

gag gtg get 
Glu Val Ala 
720 

caa ate act 
Gin lie Thr 



tec etc aag 
Ser Leu Lys 



ate tac aga 
lie Tyr Arg 
770 

acc caa tec 
Thr Gin Ser 
785 

etc ttg ace 
Leu Leu Thr 
800 

ttg ggt caa 
Leu Gly Gin 



etc acc gte 
Leu Thr Val 



ttg caa atg 
Leu Gin Met 
850 

tec tgg acc 
Ser Trp Thr 
865 

get ett get 
Ala Leu Ala 
880 

tac ggc att 
Tyr Gly lie 



660 

act etc act 
Thr Leu Thr 
675 

etc cac ggc 
Leu His Gly 



get cca tgc 
Ala Pro Cys 



tac gae etc 
Tyr Asp Leu 
725 

gtg gat ggt 
Val Asp Gly 
740 

gte ate ace 
Val lie Thr 
755 

agg att ggt 
Arg lie Gly 



age etc ttg 
Ser Leu Leu 



etc atg get 
Leu Met Ala 
805 

cat get tec 
His Ala Ser 
820 

acc gat gtg 
Thr Asp Val 
835 

get gee aac 
Ala Ala Asn 



caa ate gat 
Gin lie Asp 



gte age cca 
Val Ser Pro 
885 

gat cac aac 
Asp His Asn 
900 



cca gag att 
Pro Glu lie 
680 

aag gag tec 
Lys Glu Ser 
695 

ttc acc tct 
Phe Thr Ser 
710 

ctt etc tgg 
Leu Leu Trp 



ttc tgg gag 
Phe Trp Glu 



ttc get caa 
Phe Ala Gin 
7 60 

etc tct gag 
Leu Ser Glu 
775 

gte get ggc 
Val Ala Gly 
790 

ctt gag ggt 
Leu Glu Gly 



etc ate ttg 
Leu lie Leu 



get caa gcc 
Ala Gin Ala 
840 

cag gtg gag 
Gin Val Glu 
855 

gcc ate etc 

Ala lie Leu 
870 

ttg gae ctt 

Leu Asp Leu 



tac get gee 
Tyr Ala Ala 



665 

tec aac etc 
Ser Asn Leu 



etc att ggt 
Leu lie Gly 



get etc cac 
Ala Leu His 
715 

att gae caa 
lie Asp Gin 
730 

gag gte caa 
Glu Val Gin 
745 

gte ttg get 
Val Leu Ala 



act gag ttg 
Thr Glu Leu 



aag tec ate 
Lys Ser lie 
795 

ttc cac ace 
Phe His Thr 
810 

get gca etc 
Ala Ala Leu 
825 

atg aac aag 
Met Asn Lys 



aag gae etc 
Lys Asp Leu 



caa tgg etc 
Gin Trp Leu 
875 

get ggc atg 
Ala Gly Met 
890 

tgg caa gca 
Trp Gin Ala 
905 



670 

act gee acc 

Thr Ala Thr 
685 

gag gae etc 

Glu Asp Leu 
700 

etc acc tec 

Leu Thr Ser 



ate caa cca 
lie Gin Pro 



acc act cca 

Thr Thr Pro 

. 750 

caa etc tec 

Gin Leu Ser 
765 

tee etc att 

Ser Leu lie 
780 

ctt gat cat 

Leu Asp His 



tgg gte aac 
Trp Val Asn 



aag gat ggt 
Lys Asp Gly 
830 

gag gag tec 
Glu Glu Ser 
845 

ace aag etc 
Thr Lys Leu 
860 

caa atg tec 
Gin Met Ser 



atg get etc 
Met Ala Leu 



get gcc get 
Ala Ala Ala 
910 



etc 2063 
Leu 



aag 2111 
Lys 



caa 2159 
Gin 



get 2207 

Ala 

735 

ace 2255 
Thr 



etc 2303 
Leu 



gte 2351 
Val 



ggt 2399 
Gly 



ggt 24 4 7 

Gly 

815 

get 2495 
Ala 



etc 2543 
Leu 



acc 2591 
Thr 



tct 2639 
Ser 



aag 2687 

Lys 

8 95 

gcc 2735 
Ala 
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etc atg get gac cat gcc aac cag get cag aag aag ttg gat gag ace 
Leu Met Ala Asp His Ala Asn Gin Ala Gin Lys Lys Leu Asp Glu Thr 
915 920 925 



2783 



ttc tec aag get etc tgc aac tac tac ate aac gee gtg gtt gac tet 
Phe Ser Lys Ala Leu Cys Asn Tyr Tyr lie Asn Ala Val Vai Asp Ser 
930 935 940 



2831 



get gee ggt gtc agg gac agg aac ggt etc tac ace tac etc ttg att 
Ala Ala Gly Val Arg Asp Arg Asn Gly Leu Tyr Thr Tyr Leu Leu lie 
945 950 955 



2879 



gac aac cag gtc tet get gat gtc ate ace tec aga att get gag gee 2927 
Asp Asn Gin Val Ser Ala Asp Val lie Thr Ser Arg lie Ala Glu Ala 
960 965 970 975 

att get ggc ate caa etc tac gtc aac agg get etc aac agg gat gag 297 5 
He Ala Gly He Gin Leu Tyr Val Asn Arg Ala Leu Asn Arg. Asp Glu 
980 985 990 



ggt cag ttg get tet gat gtc tec ace agg caa ttc ttc ace gac tgg 
Gly Gin Leu Ala Ser Asp Val Ser Thr Arg Gin Phe Phe Thr Asp Trp 
995 1000 1005 



3023 



gag agg tac aac aag agg tac tec ace tgg get ggt gtc tet gag ttg 3071 
Glu Arg Tyr Asn Lys Arg Tyr Ser Thr Trp Ala Gly Vai Ser Glu Leu 
1010 1015 1020 

gtc tac tac cca gag aac tac gtg gae cea ace caa agg att ggt cag 3119 
Val Tyr Tyr Pro Glu Asn Tyr Val Asp Pro Thr Gin Arg He Gly Gin 
1025 1030 1035 

ace aag atg atg gat get ttg etc caa tee ate aac cag tec caa etc 3167 
Thr Lys Met Met Asp Ala Leu Leu Gin Ser lie Asn Gin Ser Gin Leu 
1040 1045 1050 1055 

aac get gac act gtg gag gat get ttc aag acc tac etc ace tec ttc 3215 
Asn Ala Asp Thr Val Glu Asp Ala Phe Lys Thr Tyr Leu Thr Ser Phe 
1060 1065 1070 

gag caa gtg gee aac etc aag gtc ate tet get tac cat gac aac gtc 3263 
Glu Gin Val Ala Asn Leu Lys Val He Ser Ala Tyr His Asp Asn Val 
1075 1080 1085 

aac gtg gac caa ggt etc acc tac ttc att ggc att gac caa gcc get 3311 
Asn Val Asp Gin Gly Leu Thr Tyr Phe He Gly He Asp Gin Ala Ala 
1090 1095 1100 

ect ggc acc tac tac tgg agg tet gtg gae cac tec aag tgc gag aac 3359 
Pro Gly Thr Tyr Tyr Trp Arg Ser Val Asp His Ser Lys Cys Glu Asn 
1105 1110 1115 

ggc aag ttc get gcc aac get tgg ggt gag tgg aac aag ate ace tgc 3407 
Gly Lys Phe Ala Ala Asn Ala Trp Gly Glu Trp Asn Lys He Thr Cys 
1120 1125 1130 1135 



get gtc aac ect tgg aag aac ate ate agg cea gtg gtc tac atg tec 
Ala Val Asn Pro Trp Lys Asn He He Arg Pro Val Val Tyr Met Ser 
1140 1145 1150 



3455 
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aga etc tac ttg etc tgg ett gag eaa cag tec aag aag tet gat gac 3503 
Arg Leu Tyr Leu Leu Trp Leu Glu Gin Gin Ser Lys Lys Ser Asp Asp 
1155 1160 1165 

ggc aag aea act ate tac cag tac aac etc aag ttg get cae ate cgc 3551 
Gly Lys Thr Thr lie Tyr Gin Tyr Asn Leu Lys Leu Ala His lie Arg 
1170 1175 1180 

tac gat ggt tec tgg aac act eca ttc acc ttc gat gte act gag aag 3599 
Tyr Asp Gly Ser Trp Asn Thr Pro Phe Thr Phe Asp Val Thr Glu Lys 
1185 1190 1195 

gte aag aac tac acc tec age act gat gea get gag tec ett ggt etc 3647 
Val Lys Asn Tyr Thr Ser Ser Thr Asp Ala Ala Glu Ser Leu Gly Leu 
1200 1205 1210 1215 

tac tgc act ggt tac eaa ggt gag gac ace etc ttg gte atg ttc tac 3695 
Tyr Cys Thr Gly Tyr Gin Gly Glu Asp Thr Leu Leu Val Met Phe Tyr 
1220 122S 1230 

tec atg caa tec age tac tec age tac act gac aac aac get eca gte 3743 
Ser Met Gin Ser Ser Tyr Ser Ser Tyr Thr Asp Asn Asn Ala Pro Val 
1235 1240 1245 

act ggt etc tac ate ttc get gac atg tee tet gac aac atg ace aac 37 91 
Thr Gly Leu Tyr lie Phe Ala Asp Met Ser Ser Asp Asn Met Thr Asn 
1250 1255 1260 

get eaa gee acc aac tac tgg aac aac tec tac eca caa ttc gac act 3839 
Ala Gin Ala Thr Asn Tyr Trp Asn Asn Ser Tyr Pro Gin Phe Asp Thr 
1265 1270 1275 

gte atg get gac eca gac tet gac aac aag aag gte ate acc agg cgt 3887 
Val Met Ala Asp Pro Asp Ser Asp Asn Lys Lys Val lie Thr Arg Arg 
1280 1285 1290 1295 

gte aac aac cgc tac get gag gac tac gag ate eca age tet gte acc 3935 
Val Asn Asn Arg Tyr Ala Glu Asp Tyr Glu lie Pro Ser Ser Val Thr 
1300 1305 1310 

tee aac age aac tac tec tgg ggt gac cae tee etc acc atg etc tac 3983 
Ser Asn Ser Asn Tyr Ser Trp Gly Asp His Ser Leu Thr Met Leu Tyr 
1315 1320 1325 

ggt ggc tet gte eca aac ate acc ttc gag tet gea get gag gac etc 4 031 
Gly Gly Ser Val Pro Asn lie Thr Phe Glu Ser Ala Ala Glu Asp Leu 
1330 1335 1340 

agg etc tec ace aac atg get etc tec ate att cae aac ggt tac get 4079 
Arg Leu Ser Thr Asn Met Ala Leu Ser lie lie His Asn Gly Tyr Ala 
1345 1350 1355 

ggc ace agg cgc ate eaa tgc aac etc atg aag eaa tac get tec ett 4127 
Gly Thr Arg Arg lie Gin Cys Asn Leu Met Lys Gin Tyr Ala Ser Leu 
1360 1365 1370 1375 

ggt gac aag ttc att ate tac gac tec age ttc gat gac gee aac agg 4175 
Gly Asp Lys Phe lie lie Tyr Asp Ser Ser Phe Asp Asp Ala Asn Arg 
1380 1385 1390 

ttc aac ttg gte eca etc ttc aag ttc ggc aag gat gag aac tet gat 4223 
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Phe Asn Leu Val Pro Leu Phe Lys Phe Gly Lys Asp Glu Asn Ser Asp 
1395 1400 1405 

gac tec ate tgc ate tac aac gag aac cca age tct gag gac aag aag 4271 
Asp Ser lie Cys lie Tyr Asn Glu Asn Pro Ser Ser Glu Asp Lys Lys 
1410 1415 1420 

tgg tac ttc age tec aag gac gac aac aag act get gac tac aac ggt 4319 
Trp Tyr Phe Ser Ser Lys Asp Asp Asn Lys Thr Ala Asp Tyr Asn Gly 
1425 1430 1435 

ggc ace caa tgc att gat get gge ace tec aac aag gac ttc tac tac 4367 
Gly Thr Gin Cys lie Asp Ala Gly Thr Ser Asn Lys Asp Phe Tyr Tyr 
1440 1445 1450 1455 

aac etc caa gag att gag gtc ate tct gtc act ggt ggc tac tgg tec 4415 
Asn Leu Gin Glu He Glu Vai He Ser Val Thr Gly Gly Tyr Trp Ser 
1460 1465 1470 

age tac aag ate age aac cec ate aac ate aac act ggc att gac tct '4463 
Ser Tyr Lys He Ser Asn Pro He Asn He Asn Thr Gly He Asp Ser 
1475 1480 1485 

gee aag gtc aag gtc act gtc aag get ggt gge gat gac caa ate ttc 4 511 
Ala Lys Val Lys Val Thr Val Lys Ala Gly Gly Asp Asp Gin He Phe 
1490 1495 1500 

act get gac aac tee ace tac gtc cca cag caa ect get cca tec ttc 4559 
Thr Ala Asp Asn Ser Thr Tyr Vai Pro Gin Gin Pro Ala Pro Ser Phe 
1505 1510 1515 

gag gag atg ate tac caa ttc aac aac etc ace att gac tgc aag aac 4 607 
Glu Glu Met He Tyr Gin Phe Asn Asn Leu Thr He Asp Cys Lys Asn 
1520 1525 1530 1535 

etc aac ttc att gac aac cag get cae att gag att gac ttc act gee 4655 
Leu Asn Phe He Asp Asn Gin Ala His He Glu He Asp Phe Thr Ala 
1540 1545 1550 

aca get caa gat ggc cgc ttc ttg ggt get gag acc ttc ate att cca 4703 
Thr Ala Gin Asp Gly Arg Phe Leu Gly Ala Glu Thr Phe He lie Pro 
1555 1560 1565 

gtc acc aag aag gtc ett ggc act gag aac gtc att get etc tac tct 4751 
Vai Thr Lys Lys Val Leu Gly Thr Glu Asn Val He Ala Leu Tyr Ser 
1570 1575 1580 

gag aac aac ggt gtc cag tac atg caa att ggt get tac aga acc agg 4799 
Glu Asn Asn Gly Vai Gin Tyr Met Gin He Gly Ala Tyr Arg Thr Arg 
1585 1590 1595 

etc aac acc etc ttc get caa cag ttg gtc tec egt gcc aac aga ggc 4 847 
Leu Asn Thr Leu Phe Ala Gin Gin Leu Vai Ser Arg Ala Asn Arg Gly 
1600 1605 - 1610 1615 

att gat get gtc etc age atg gag act cag aac ate caa gag cca caa- 4895 
He Asp Ala Vai Leu Ser Met Glu Thr Gin Asn He Gin Glu Pro Gin 
1620 1625 1630 

ett ggt get gge ace tac gtc caa ett gtc ttg gac aag tac gat gag " 4943 
Leu Gly Ala Gly Thr Tyr Val Gin Leu Vai Leu Asp Lys Tyr Asp Glu 
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1635 1640 1645 

tec att cat ggc acc aac aag tec ttc gee att gag tac gtg gac ate 4991 
Ser lie His Gly Thr Asn Lys Ser Phe Ala lie Glu Tyr Val Asp lie 
1650 1655 1660 

ttc aag gag aac gac tec ttc gtc ate tac caa ggt gag ttg tct gag 5039 
Phe Lys Glu Asn Asp Ser Phe Val lie Tyr Gin Gly Glu Leu Ser Glu 
1665 1670 1675 

acc tec caa act gtg gtc aag gtc ttc etc tec tac ttc att gag gee 5087 
Thr Ser Gin Thr Val Val Lys Val Phe Leu Ser Tyr Phe lie Glu Ala 
1680 1685 1690 1695 

ace ggt aac aag aac cac etc tgg gtc agg gee aag tac cag aag gag 5135 
Thr Gly Asn Lys Asn His Leu Trp Val Arg Ala Lys Tyr Gin Lys Glu 
1700 1705 1710 

acc act gac aag ate etc ttc gac agg act gat gag aag gac cca cat 5183 
Thr Thr Asp Lys lie Leu Phe Asp Arg Thr Asp Glu Lys Asp Pro His 
1715 1720 1725 

ggt tgg ttc etc tct gat gac cac aag ace ttc tct ggt etc age tct 5231 
Gly Trp Phe Leu Ser Asp Asp His Lys Thr Phe Ser Gly Leu Ser Ser 
,1730 1735 1740 

get caa get etc aag aac gac tct gag cca atg gac ttc tct ggt gee 527 9 
Ala Gin Ala Leu Lys Asn Asp Ser Glu Pro Met Asp Phe Ser Gly Ala 
1745 1750 1755 

aac get etc tac ttc tgg gag ttg ttc tac tac act cca atg atg atg 5327 
Asn Ala Leu Tyr Phe Trp Glu Leu Phe Tyr Tyr Thr Pro Met Met Met 
1760 1765 1770 1775 

get cac agg etc ett caa gag cag aac ttc gat get gcc aac cac tgg 5375 
Ala His Arg Leu Leu Gin Glu Gin Asn Phe Asp Ala Ala Asn His Trp 
1780 1785 1790 

ttc cge tac gtc tgg age eea tct ggt tac att gtg gat ggc aag att 5423 
Phe Arg Tyr Val Trp Ser Pro Ser Gly Tyr lie Val Asp Gly Lys lie 
1795 1800 1805 

gcc ate tac cac tgg aac gtc agg cca ttg gag gag gac acc tec tgg 5471 
Ala lie Tyr His Trp Asn Val Arg Pro Leu Glu Glu Asp Thr Ser Trp 
1810 1815 1820 

aac get cag caa ett gac tee act gac cca gat get gtg get caa gat 5519 
Asn Ala Gin Gin Leu Asp Ser Thr Asp Pro Asp Ala Val Ala Gin Asp 
1825 1830 1835 

gac cca atg cac tac aag gtg gcc acc ttc atg gcc acc ttg gac ett 5567 
Asp Pro Met His Tyr Lys Val Ala Thr Phe Met Ala Thr Leu Asp Leu 
1840 1845 1850 1855 

etc atg gcc aga ggt gat get gee tac cgc caa ttg gag agg gac ace 5615 
Leu Met Ala Arg Gly Asp Ala Ala Tyr Arg Gin Leu Glu Arg Asp Thr 
1860 1865 1870 

ttg get gag gee aag atg tgg tac acc caa get etc aac ttg ctg ggt 5663 
Leu Ala Glu Ala Lys Met Trp Tyr Thr Gin Ala Leu Asn Leu Leu Gly 
1875 1880 1885 
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gat gag cca caa gtc atg etc tec aca ace tgg gcc aac oca acc ttg 5711 
Asp Glu Pro Gin Val Met Leu Ser Thr Thr Trp Ala Asn Pro Thr Leu 
1890 1895 1900 



ggc aac get gcc tec aag acc aca caa cag gtc agg caa cag gtc etc 
Gly Asn Ala Ala Ser Lys Thr Thr Gin Gin Val Arg Gin Gin Val Leu 
1905 1910 1915 

acc caa etc agg etc aac tet aga gtc aag act cca etc ttg ggc act 
Thr Gin Leu Arg Leu Asn Ser Arg Val Lys Thr Pro Leu Leu Gly Thr 
1920 1925 1930 1935 

gcc aac tec etc act get etc ttc etc cca caa gag aac tec aaa ctt 
Ala Asn Ser Leu Thr Ala Leu Phe Leu Pro Gin Glu Asn Ser Lys Leu 
1940 1945 1950 

aag ggt tac tgg agg acc ctt get caa cgc atg ttc aac etc agg cae 
Lys Gly Tyr Trp Arg Thr" Leu Ala. Gin Arg Met Phe Asn Leu Arg His 
1955 I960 1965 

aac etc tec att gat ggt caa cca etc tec ttg cca etc tac get aag 
Asn Leu Ser lie Asp Gly Gin Pro Leu Ser Leu Pro Leu Tyr Ala Lys 
1970 1975 1980 

cca get gae cca aag get etc ctt tee get get gtc tee gea tee caa 
Pro Ala Asp Pro Lys Ala Leu Leu Ser Ala Ala Val Ser Ala Ser Gin 
1985 1990 1995 

ggt ggt get gae etc cca aag get cca etc ace ate cac agg ttc cca 
Gly Gly Ala Asp Leu Pro Lys Ala Pro Leu Thr lie His Arg Phe Pro 
2000 2005 2010 2015 

caa atg ttg gag ggt gcc cgt ggt ctt gtc aac cag etc ate caa ttc 
Gin Met Leu Glu Gly Ala Arg Gly Leu Val Asn Gin Leu lie Gin Phe 
2020 2025 2030 

ggt tec tet etc ctt ggt tac tet gag agg caa gat get gag gee atg 
Gly Ser Ser Leu Leu Gly Tyr Ser Glu Arg Gin Asp Ala Glu Ala Met 
2035 2040 2045 

tec caa etc ttg caa acc cag get tet gag ttg ate etc acc tec ate 
Ser Gin Leu Leu Gin Thr Gin Ala Ser Glu Leu He Leu Thr Ser He 
2050 2055 2060 

agg atg caa gae aac cag ctt get gag ttg gae tet gag aag act get 
Arg Met Gin Asp Asn Gin Leu Ala Glu Leu Asp Ser Glu Lys Thr Ala 
2065 2070 2075 

etc caa gtc tec ctt get ggt gtc caa cag agg ttc gae age tac tec 
Leu Gin Val Ser Leu Ala Gly Val Gin Gin Arg Phe Asp Ser Tyr Ser 
2080 2085 2090 2095 

caa etc tac gag gag aac ate aac get ggt gag caa agg get ttg get 
Gin Leu Tyr Glu Glu Asn He Asn Ala Gly Glu Gin Arg Ala Leu Ala 
2100 2105 2110 

etc agg tet gag tet gee att gag tec caa ggt get caa ate tec cgc 
Leu Arg Ser Glu Ser Ala He Glu Ser Gin Gly Ala Gin He Ser Arg 
2115 2120 2125 



5759 



5807 



5855 



5903 



5951 



5999 



6047 



6095 



6143 



6191 



6239 



6287 



6335 



6383 
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get etc ate cag aac ttg get caa atg gag gag get tae etc aag tgg 
Ala Leu lie Gin Asn Leu Ala Gin Met Glu Glu Ala Tyr Leu Lys Trp 
2305 2310 2315 



6527 



6575 



6623 



atg get ggt get ggc gtg gae atg get eea aac ate ttc ggt ett get 64 31 
Met Ala Gly Ala Gly Val Asp Met Ala Pro Asn lie Phe Gly Leu Ala 
2130 2135 2140 

gat ggt ggc atg eac tae ggt gee att get tae gee att get gat ggc 647 9 
Asp Gly Gly Met His Tyr Gly Ala He Ala Tyr Ala He Ala Asp Gly 
2145 2150 2155 

att gag ctt tct get tct gee aag atg gtt gat get gag aag gtg get 
He Glu Leu Ser Ala Ser Ala Lys Met Val Asp Ala Glu Lys Val Ala 
2160 2165 2170 2175 

caa tct gaa ate tae egt cge aga egc caa gaa tgg aag ate caa agg 
Gin Ser Glu He Tyr Arg Arg Arg Arg Gin Glu Trp Lys He Gin Arg 
2180 2185 2190 

gae aac get caa get gag ate aac cag etc aac get caa ett gag tec 
Asp Asn Ala Gin Ala Glu He Asn Gin Leu Asn Ala Gin Leu Glu Ser 
2195 2200 2205 

etc age ate agg egt gag get get gag atg cag aag gag tae etc aag 6671 
Leu Ser He Arg Arg Glu Ala Ala Glu Met Gin Lys Glu Tyr Leu Lys 
2210 2215 2220 

ace caa cag get caa get cag get caa etc ace ttc etc agg tec aag 6719 
Thr Gin Gin Ala Gin Ala Gin Ala Gin Leu Thr Phe Leu Arg Ser Lys 
2225 2230 2235 

ttc tee aac cag get etc tae tec tgg etc aga ggc cge etc tct ggc 6767 
Phe Ser Asn Gin Ala Leu Tyr Ser Trp Leu Arg Gly Arg Leu Ser Gly 
2240 2245 2250 2255 

ate tae ttc caa ttc tac gae ttg get gtc tec cge tgc etc atg get 6815 
He Tyr Phe Gin Phe Tyr Asp Leu Ala Val Ser Arg Cys Leu Met Ala 
2260 2265 2270 

gag caa tec tac caa tgg gag gee aac gae aac age ate tec ttc gtc 6863 
Glu Gin Ser Tyr Gin Trp Glu Ala Asn Asp Asn Ser He Ser Phe Val 
2275 2280 2285 

aag cca ggt get tgg caa ggc acc tac get ggt etc ett tgc ggt gag 6911 
Lys Pro Gly Ala Trp Gin Gly Thr Tyr Ala Gly Leu Leu Cys Gly Glu 
2290 2295 2300 



6959 



gag tee aga get ttg gag gta gag agg act gtc tec ett get gta gtc 7007 
Glu Ser Arg Ala Leu Glu Val Glu Arg Thr Val Ser Leu Ala Val Val 
2320 2325 2330 2335 

tae gae tec ttg gag ggc aac gae agg ttc aac ctt get gag caa ate 7055 
Tyr Asp Ser Leu Glu Gly Asn Asp Arg Phe Asn Leu Ala Glu Gin He 
2340 2345 2350 

cca get etc ttg gae aag ggt gag ggc act get ggc ace aag gag aac 7103 
Pro Ala Leu Leu Asp Lys Gly Glu Gly Thr Ala Gly Thr Lys Glu Asn 
2355 2360 2365 

ggt etc tec ttg gee aac gee ate etc tct get tct gtc aag etc tct 7151 

42 



BNSDOCID: <WO 0111029A1_L> 



wo 01/11029 



PCT/USOO/22237 



Gly Leu Ser Leu Ala Asn Ala He Leu Ser Ala Ser Val Lys Leu Ser 
2370 2375 2380 

gac etc aag ttg ggt act gac tac cca gac tec att gtg ggt tec aac 7199 
Asp Leu Lys Leu Gly Thr Asp Tyr Pro Asp Ser He Val Gly Ser Asn 
2385 2390 2395 

aag gtc aga agg ate aag caa ate tet gte tec etc cca get ttg gtg 7247 
Lys Val Arg Arg He Lys Gin He Ser Val Ser Leu Pro Ala Leu Val 
2400 2405 2410 2415 

ggt cca tac caa gat gtc caa gee atg etc tec tac ggt ggc tec ace 7295 
Gly Pro Tyr Gin Asp Val Gin Ala Met Leu Ser Tyr Gly Gly Ser Thr 
2420 2425 2430 

caa etc cca aag ggt tgc tct get ttg get gtc tec cac ggc ace aac 7343 
Gin Leu Pro Lys Gly Cys Ser Ala Leu Ala Val Ser His Gly Thr Asn 
2435 2440 2445 

gac tet ggt caa ttc caa ett gac ttc aac gat ggc aag tac etc cca . 7391 
Asp Ser Gly Gin Phe Gin Leu Asp Phe Asn Asp Gly Lys Tyr Leu Pro 
2450 2455 2460 

ttc gaa ggc att get ttg gat gac caa ggc acc etc aac etc caa ttc 74 39 
Phe Glu Gly He Ala Leu Asp Asp Gin Gly Thr Leu Asn Leu Gin Phe 
2465 2470 2475 

cca aac gee act gac aag cag aag gee ate etc caa acc atg tet gac 7487 
Pro Asn Ala Thr Asp Lys Gin Lys Ala He Leu Gin Thr Met Ser Asp 
2480 2485 2490 2495 

ate ate etc cac ate agg tac ace ate agg tgagctcgag aggcetgegg 7537 
He He Leu His He Arg Tyr Thr He Arg 
2500 2505 



ccgc 



7541 



<210> 5 
<211> 63 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> Description of Artificial Sequence: hemicot sequence 
encoding ER signal from 15 kDa zein from Black 
Mexican Sweet maize 

<220> 
<221> CDS 
<222> {l)-.(63) 

<400> 5 

atg get aag atg gtc att gtg ett gtg gtc tgc ttg get etc tct get 

Met Ala Lys Met Val He Val Leu Val Val Cys Leu Ala Leu Ser Ala 

15 10 15 

gee tgt get tea gee 
Ala Cys Ala Ser Ala 
20 
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<210> 6 
<211> 7621 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> Description of Artificial Sequence : hemicot tcdA 
fused to the modified 15 kDa zein endoplasmic 
reticulum signal peptide 

<220> 

<221> CDS 

<222> (4) . . (7614) 

<400> 6 

ncc atg get aag atg gtc att gtg ctt gtg gtc tgc ttg get etc tct 4 8 

Met Ala Lys Met Val lie Val Leu Val Val Cys Leu Ala Leu Ser 
1 5 10 '15 

get gee tgt get tea gee atg aae gag tee gte aag gag ate eea gac 96 
Ala Ala Cys Ala Ser Ala Met Asn Glu Ser Val Lys Glu lie Pro Asp 
20 25 30 

gte etc aag tee eaa tgc ggt tte aac tgc etc act gac ate tec cac 144 
Val Leu Lys Ser Gin Cys Gly Phe Asn Cys Leu Thr Asp lie Ser His 
35 40 45 



age tec tte aae gag tte aga eaa eaa gtc tct gag cac etc tec tgg 
Ser Ser Phe Asn Glu Phe Arg Gin Gin Val Ser Glu His Leu Ser Trp 
50 55 60 



192 



tec gag acc cat gac etc tac cat gac get eag eaa get eag aag gac 240 
Ser Glu Thr His Asp Leu Tyr His Asp Ala Gin Gin Ala Gin Lys Asp 
65 70 75 

aac agg etc tac gag get agg ate etc aag agg get aae eea eaa etc 288 
Asn Arg Leu Tyr Glu Ala Arg lie Leu Lys Arg Ala Asn Pro Gin Leu 
80 85 90 95 

eag aae get gtc cac etc gee ate ttg get eea aae get gag ttg att 336 
Gin Asn Ala Val His Leu Ala lie Leu Ala Pro Asn Ala Glu Leu lie 
100 105 110 

ggt tac aac aac cag tte tct ggc aga get age eag tac gtg get cct 384 
Gly Tyr Asn Asn Gin Phe Ser Gly Arg Ala Ser Gin Tyr Val Ala Pro 
115 120 125 

ggt aca gtc tec tec atg tte age eea gee get tac etc act gag ttg 4 32 
Gly Thr Val Ser Ser Met Phe Ser Pro Ala Ala Tyr Leu Thr Glu Leu 
130 135 140 

tac ege gag get agg aae ctt eat get tet gac tee gtc tac tac ttg 480 
Tyr Arg Glu Ala Arg Asn Leu His Ala Ser Asp Ser Val Tyr Tyr Leu 
145 150 155 

gac aca ege aga eea gac etc aag age atg gee etc age eaa cag aac 528 
Asp Thr Arg Arg Pro Asp Leu Lys Ser Met Ala Leu Ser Gin Gin Asn 
160 165 170 175 

atg gac att gag ttg tec acc etc tec ttg age aac gag ctt etc ttg 576 
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Met Asp lie Glu Leu Ser Thr Leu Ser Leu Ser Asn Glu Leu Leu Leu 
180 185 190 

gag tec ate aag act gag age aag ttg gag aac tac acc aag gtc atg 624 

Glu Ser lie Lys Thr Glu Ser Lys Leu Glu Asn Tyr Thr Lys Val Met 
195 200 205 



gag atg etc tec acc ttc aga cca age ggt gca act cca tac cat gat 
Glu Met Leu Ser Thr Phe Arg Pro Ser Gly Ala Thr Pro Tyr His Asp 

210 215 220 



gag caa etc aac get tct cca gee att get ggt ttg atg eac cag gca 
Glu Gin Leu Asn Ala Ser Pro Ala lie Ala Gly Leu Met His Gin Ala 
240 245 250 255 



ttg act gag gag ate act gag ggc aac get gag gag ttg tac aag aag 

Leu Thr Glu Glu He Thr Glu Gly Asn Ala Glu Glu Leu Tyr Lys Lys 
275 280 285 

aac ttc ggc aac att gag cca gee tct ett gca atg ect gag tac etc 

Asn Phe Gly Asn lie Glu Pro Ala Ser Leu Ala Met Pro Glu Tyr Leu 

290 295 300 



aag get tec aac ttc ggt caa cag gag tac age aac aac cag etc ate 
Lys Ala Ser Asn Phe Gly Gin Gin Glu Tyr Ser Asn Asn Gin Leu lie 
320 325 330 335 



gtc agg act gag ggt get ect caa gtg aac att gag tac tct gee aac 
Val Arg Thr Glu Gly Ala Pro Gin Val Asn He Glu Tyr Ser Ala Asn 
400 405 410 415 



672 



gee tac gag aac gtc agg gag gtc ate caa ett caa gae ect ggt ett 720 
Ala Tyr Glu Asn Val Arg Glu Val He Gin Leu Gin Asp Pro Gly Leu 
225 230 235 



768 



tec ttg etc ggt ate aac gee tec ate tct ect gag ttg ttc aac ate 816 
Ser Leu Leu Gly lie Asn Ala Ser He Ser Pro Glu Leu Phe Asn He 
260 265 270 



864 



912 



aag agg tac tac aac ttg tct gat gag gag ett tct caa ttc att ggc 960 
Lys Arg Tyr Tyr Asn Leu Ser Asp Glu Glu Leu Ser Gin Phe He Gly 
305 310 315 



1008 



act cca gtt gtg aac tec tct gat ggc act gtg aag gtc tac cgc ate 1056 
Thr Pro Val Val Asn Ser Ser Asp Gly Thr Val Lys Val Tyr Arg He 
340 345 350 

aca cgt gag tac acc aca aac gee tac caa atg gat gtt gag ttg ttc 1104 
Thr Arg Glu Tyr Thr Thr Asn Ala Tyr Gin Met Asp Val Glu Leu Phe 
355 360 365 

cca ttc ggt ggt gag aac tac aga ett gae tac aag ttc aag aae ttc 1152 
Pro Phe Gly Gly Glu Asn Tyr Arg Leu Asp Tyr Lys Phe Lys Asn Phe 
370 375 380 

tac aac gcc tec tac etc tec ate aag ttg aac gae aag agg gag ett 1200 
Tyr Asn Ala Ser Tyr Leu Ser He Lys Leu Asn Asp Lys Arg Glu Leu 
385 390 395 



1248 



ate acc etc aac aca get gae ate tct caa cca ttc gag att ggt ttg 1296 
He Thr Leu Asn Thr Ala Asp He Ser Gin Pro Phe Glu He Gly Leu 
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acc aga gtc 
Thr Arg Val 



act gtt gag 
Thr Val Glu 
450 

gca att cgt 
Ala lie Arg 
465 

ggc att gtg 
Gly He Val 
480 

ctt ggc aag 
Leu Gly Lys 



cat get gag 
His Ala Glu 



tec tac gac 
Ser Tyr Asp 
530 

etc ttg aac 
Leu Leu Asn 
545 

aac tct ggc 
Asn Ser Gly 

560 

ttc aac att 
Phe Asn He 



420 

ctt ccc tct 
Leu Pro Ser 
435 

gag tac aac 
Glu Tyr Asn 



etc age aga 
Leu Ser Arg 



agg tct gtc 
Arg Ser Val 
485 

gtc ttc^ctc 
Val Phe Leu 
500 

act gca etc 
Thr Ala Leu 

515 

aac cag cet 
Asn Gin Pro 



ggc cag tac 
Gly Gin Tyr 



tec aca ggt 
Ser Thr Gly 

565 

gat gat gtc 
Asp Asp Val 
580 



ggc tec tgg 
Gly Ser Trp 
440 

cag tac tct 
Gin Tyr Ser 
455 

gee act gag 
Ala Thr Glu 
470 

aac ctt caa 
Asn Leu Gin 



acc aag tac 
Thr Lys Tyr 



ate etc tgc 
lie Leu Cys 
520 

tee cag ttc 
Ser Gin Phe 
535 

ttc tec act 
Phe Ser Thr 
550 

gac tgg aga 
Asp Trp Arg 



tct etc ttc 
Ser Leu Phe 



425 

gcc tac get 
Ala Tyr Ala 



ttc etc ttg 
Phe Leu Leu 



ttg tct ccc 
Leu Ser Pro 
475 

ctt gac ate 
Leu Asp He 
490 

tac atg caa 
Tyr Met Gin 
505 

aac gca ccc 
Asn Ala Pro 



gac agg etc 
Asp Arg Leu 



ggt gat gag 
Gly Asp Glu 
555 

aag acc ate 
Lys Thr He 
570 

cgt etc ttg 
Arg Leu Leu 
585 



430 

gca gcc aag 
Ala Ala Lys 
445 

aag etc aac 
Lys Leu Asn 
460 

ace ate ttg 
Thr He Leu 



aac act gat 
Asn Thr Asp 



cgc tac gcc 
Arg Tyr Ala 
510 

ate tct caa 
He Ser Gin 
525 

ttc aac act 
Phe Asn Thr 
540 

gag att gac 
Glu He Asp 



ttg aag agg 
Leu Lys Arg 



aag ate aca 
Lys He Thr 
590 



ttc 1344 
Phe 



aag 1392 
Lys 



gag 1440 
Glu 



gtg 1488 

Val 

495 

ate 1536 
He 



cgc 1584 
Arg 



cet 1632 
Pro 



etc 1680 
Leu 



gcc 1728 

Ala 

575 

gat 1776 
Asp 



cac gac aac aag gat ggc aag ate aag aac aac ttg aag aac ctt tec 1824 

His Asp Asn Lys Asp Gly Lys He Lys Asn Asn Leu Lys Asn Leu Ser 
595 600 605 

aac etc tac att ggc aag ttg ctt gca gac ate cac caa etc acc att 1872 

Asn Leu Tyr He Gly Lys Leu Leu Ala Asp He His Gin Leu Thr He 
610 615 620 

gat gag ttg gac etc ttg etc att gca gtc ggt gag ggc aag acc aac 1920 

Asp Glu Leu Asp Lea Leu Leu He Ala Val Gly Glu Gly Lys Thr Asn 

625 630 635 

etc tct gca ate tct gac aag cag ttg gca acc etc ate agg aag- ttg 1968 

Leu Ser Ala He Ser Asp Lys Gin Leu Ala Thr Leu He Arg Lys Leu 
640 645 650 655 

aac ace ate ace tec tgg ctt cac acc cag aag tgg tct gtc ttc caa 2016 

Asn Thr He Thr Ser Trp Leu His Thr Gin Lys Trp Ser Val Phe Gin 
660 665 670 
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etc ttc ate atg aee age acc tee tac aac aag aec etc act cct gag 2064 

Leu Phe lie Met Thr Ser Thr Ser Tyr Asn Lys Thr Leu Thr Pro Glu 
675 680 685 

ate aag aac etc ttg gac aca gtc tac cac ggt etc caa ggc ttc gac 2112 

He Lys Asn Leu Leu Asp Thr Val Tyr His Gly Leu Gin Gly Phe Asp 
690 695 700 

aag gac aag get gac ttg ctt cat gtc atg get ccc tac att gca gee 2160 

Lys Asp Lys Ala Asp Leu Leu His Val Met Ala Pro Tyr He Ala Ala 

705 710 715 

acc etc caa etc tec tct gag aac gtg get cac tet gtc ttg etc tgg 2208 

Thr Leu Gin Leu Ser Ser Glu Asn Val Ala His Ser Val Leu Leu Trp 
720 725 730 735 

get gac aag etc caa cct ggt gat ggt gee atg act get gag aag ttc 2256 

Ala Asp Lys Leu Gin Pro Gly Asp Gly Ala Met Thr Ala Glu Lys. Phe 
740 745 750 

tgg gac tgg etc aac ace aag tac aca eca ggc tec tct gag get gtt 2304 

Trp Asp Trp Leu Asn Thr Lys Tyr Thr Pro Gly Ser Ser Glu Ala Val 
755 760 765 

gag act caa gag cac att gtg caa tac tgc cag get ctt gca cag ttg 2352 

Glu Thr Gin Glu His He Val Gin Tyr Cys Gin Ala Leu Ala Gin Leu 
770 775 780 

gag atg gtc tac cac tee act ggc ate aac gag aac get ttc aga etc 2400 

Glu Met Val Tyr His Ser Thr Gly He Asn Glu Asn Ala Phe Arg Leu 

785 790 795 



ttc gtc acc aag cct gag atg ttc ggt get gee aca ggt get gca cct 
Phe Val Thr Lys Pro Glu Met Phe Gly Ala Ala Thr Gly Ala Ala Pro 
800 805 810 815 



gee aac etc ttg etc caa get tec att caa get cag aac cac caa cac 
Ala Asn Leu Leu Leu Gin Ala Ser He Gin Ala Gin Asn His Gin His 
' 865 870 875 



2448 



get cat gat get etc tec etc ate atg ttg acc agg ttc get gac tgg 2496 
Ala His Asp Ala Leu Ser Leu He Met Leu Thr Arg Phe Ala Asp Trp 
820 825 830 

gtc aac get ctt ggt gag aag get tee tet gtc ttg get gee ttc gag 254 4 
Val Asn Ala Leu Gly Glu Lys Ala Ser Ser Val Leu Ala Ala Phe Glu 
835 840 845 

gee aac tec etc act get gag caa ctt get gat gee atg aac ctt gat 2592 
Ala Asn Ser Leu Thr Ala Glu Gin Leu Ala Asp Ala Met Asn Leu Asp 
850 855 860 



2640 



etc eca cct gtc act eca gag aac get ttc tec tgc tgg acc tec ate 2688 

Leu Pro Pro Val Thr Pro Glu Asn Ala Phe Ser Cys Trp Thr Ser He 

880 885 890 895 

aac acc ate etc caa tgg gtc aac gtg get cag caa etc aac gtg get 2736 

Asn Thr He Leu Gin -Trp Val Asn Val Ala Gin Gin Leu Asn Val Ala 
900 905 910 
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acc tec ttc gag caa gtt gcc aac etc aag gtc ate tct get tac cat 
Thr Ser Phe Glu Gin Val Ala Asn Leu Lys Val He Ser Ala Tyr His 
1105 1110 1115 

gac aae ate aac aac gac caa ggt etc acc tac ttc att ggt etc tct 
Asp Asn He Asn Asn Asp Gin Gly Leu Thr Tyr Phe He Gly Leu Ser 
1120 1125 1130 1135 



2832 



2880 



2928 



2976 



cca caa ggt gtc tct get ttg gtc ggt ctt gac tac ate cag tec atg 2784 
Pro Gin Gly Val Ser Ala Leu Val Gly Leu Asp Tyr He Gin Ser Met 
915 920 925 

aag gag aca cca acc tac get caa tgg gag aac gea get ggt gtc ttg 
Lys Glu Thr Pro Thr Tyr Ala Gin Trp Glu Asn Ala Ala Gly Val Leu 
930 935 940 

act get ggt etc aac tec caa cag gcc aae ace etc cat get ttc ttg 
Thr Ala Gly Leu Asn Ser Gin Gin Ala Asn Thr Leu His Ala Phe Leu 
945 950 955 

gat gag tct cgc tct get gcc etc tec ace tac tac ate agg caa gtc 
Asp Glu Ser Arg Ser Ala Ala Leu Ser Thr Tyr Tyr He Arg Gin Val 
960 965 970 975 

gcc aag gca get get gcc ate aag tct cgc gat gac etc tac caa tac 
Ala Lys Ala Ala Ala Ala He Lys Ser Arg Asp Asp Leu Tyr Gin Tyr 
980 985- 990 

etc etc att gac aac cag gtc tct get gee ate aag acc acc agg ate 3024 
Leu Leu He Asp Asn Gin Val Ser Ala Ala He Lys Thr Thr Arg He 
995 1000 1005 

get gag gcc ate get tec ate caa etc tac gtc aac cgc get ctt gag 3072 
Ala Glu Ala He Ala Ser He Gin Leu Tyr Val Asn Arg Ala Leu Glu 
1010 1015 1020 

aac gtt gag gag aac gee aac tct ggt gtc ate tct cgc caa ttc ttc 
Asn Val Glu Glu Asn Ala Asn Ser Gly Val He Ser Arg Gin Phe Phe 
1025 1030 1035 

ate gac tgg gac aag tac aac aag agg tac tec ace tgg get ggt gtc 
He Asp Trp Asp Lys Tyr Asn Lys Arg Tyr Ser Thr Trp Ala Gly Val 
1040 1045 1050 1055 

tct caa ctt gtc tac tac cca gag aac tac att gac cca acc atg agg 
Ser Gin Leu Val Tyr Tyr Pro Glu Asn Tyr He Asp Pro Thr Met Arg 
1060 1065 1070 

att ggt cag acc aag atg atg gat get etc ttg caa tct gtc tec caa 
He Gly Gin Thr Lys Met Met Asp Ala Leu Leu Gin Ser Val Ser Gin 
1075 1080 1085 

age caa etc aac get gac act gtg gag gat gcc ttc atg age tac etc 3312 
Ser Gin Leu Asn Ala Asp Thr Val Glu Asp Ala Phe Met Ser Tyr Leu 
1090 1095 1100 



3120 



3168 



3216 



3264 



3360 



3408 



gag act gat get ggt gag tac tac tgg aga tec gtg gac cac age aag 34 56 
Glu Thr Asp Ala Gly Glu Tyr Tyr Trp Arg Ser Val Asp His Ser Lys 
1140 1145 1150 



ttc aac gat ggc aag ttc get gca aac get tgg tct gag tgg cac aag 

*48 
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Phe Asn Asp Gly Lys Phe Ala Ala Asn Ala Trp Ser Glu Trp His Lys 
1155 1160 , 1165 

att gac tgc cct ate aac cca tac aag tec aee atcaga cot gtc ate 
He Asp Cys Pro He Asn Pro Tyr Lys Ser Thr He Arg Pro Val He 
1170 1175 1180 

tac aag age egc etc tac ttg etc tgg ett gag eag aag gag ate acc 
Tyr Lys Ser Arg Leu Tyr Leu Leu Trp Leu Glu Gin Lys Glu He Thr 
1185 1190 1195 

aag caa act ggc aae tee aag gat ggt tac eaa act gag act gac tac 
Lys Gin Thr Gly Asn Ser Lys Asp Gly Tyr Gin Thr Glu Thr Asp Tyr 
1200 1205 1210 1215 

cgc tac gag ttg aag ttg get cac ate cgc tac gat ggt aee tgg aac 
Arg Tyr Glu Leu Lys Leu Ala His He Arg Tyr Asp Gly Thr Trp Asn 
1220 1225 1230 

act cca ate acc tte gat gtc aac aag aag ate age gag ttg aag ttg 
Thr Pro He Thr Phe Asp Val Asn Lys Lys He Ser Glu Leu Lys Leu 
1235 1240 1245 

gag aag aac cgt get cct ggt etc tac tgc get ggt tac caa ggt gag 
Glu Lys Asn Arg Ala Pro Gly Leu Tyr Cys Ala Gly Tyr Gin Gly Glu 
1250 1255 1260 

gac acc etc ttg gtc atg ttc tac aac cag caa gac ace ett gac tec 
Asp Thr Leu Leu Val Met Phe Tyr Asn Gin Gin Asp Thr Leu Asp Ser 
1265 1270 1275 

tac aag aac get tec atg caa ggt etc tac ate ttc get gac atg get 
Tyr Lys Asn Ala Ser Met Gin Gly Leu Tyr He Phe Ala Asp Met Ala 
1280 1285 1290 1295 

tec aag gac atg act cca gag eaa age aae gtc tae cgt gac aac tec 
Ser Lys Asp Met Thr Pro Glu Gin Ser Asn Val Tyr Arg Asp Asn Ser 
1300 1305 1310 

tac caa cag ttc gac acc aae aac gtc agg cgt gtc aae aae aga tac 
■ Tyr Gin Gin Phe Asp Thr Asn Asn Val Arg Arg Val Asn Asn Arg Tyr 
1315 1320 1325 

get gag gac tac gag ate cca age tet gtc age tct cgc aag gac tae 
Ala Glu Asp Tyr Glu He Pro Ser Ser Val Ser Ser Arg Lys Asp Tyr 
1330 1335 1340 

qqc tgg ggt gac tac tac etc age atg gtg tae aac ggt gac ate cca 
Gly Trp Gly Asp Tyr Tyr Leu Ser Met Val Tyr Asn Gly Asp He Pro 
1345 1350 1355 

acc ate aac tac aag get gcc tct tec gac etc aaa ate tae ate age 
Thr He Asn Tyr Lys Ala Ala Ser Ser Asp Leu Lys He Tyr He Ser 
1360 1365 1370 1375 

cca aag etc agg ate ate eac aac ggc tac gag ggt cag aag agg aac 
Pro Lys Leu Arg He He His Asn Gly Tyr Glu Gly Gin Lys Arg Asn 
1380 1385 1390 



3552 



3600 



3648 



3696 



3744 



3792 



3840 



3888 



3936 



3984 



4032 



4080 



4128 



4176 



cag tgc aac ttg atg aac aag tac ggc aag ttg ggt gac aag ttc att 4224 
Gin Cys Asn Leu Met Asn Lys Tyr Gly Lys Leu Gly Asp Lys Phe He 
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1395 1400 1405 

gtc tac acc tct ctt ggt gtc aac cca aac aac age tec aac aag etc 4272 
Val Tyr Thr Ser Leu Gly Val Asn Pro Asn Asn Ser Ser Asn Lys Leu 
1410 1415 1420 

atg ttc tac cca gtc tac caa tac tct ggc aac acc tct ggt etc aac 4320 
Met Phe Tyr Pro Val Tyr Gin Tyr Ser Gly Asn Thr Ser Gly Leu Asn 
1425 1430 1435 

cag ggt aga etc ttg ttc cac agg gac acc acc tac cca age aag gtg 4368 
Gin Gly Arg Leu Leu Phe His Arg Asp Thr Thr Tyr Pro Ser Lys Val 
1440 1445 1450 1455 

gag get tgg att cct ggt gee aag agg tec etc acc aac cag aac get 4416 
Glu Ala Trp lie Pro Gly Ala Lys Arg Ser Leu Thr Asn Gin Asn Ala 
1460 1465 1470 

gee att ggt gat gac tac gee aca gac tec etc aac aag cet gat gac 4464 
Ala lie Gly Asp Asp Tyr Ala Thr Asp Ser Leu Asn Lys Pro Asp Asp 
1475 1480 1485 

etc aag cag tac ate ttc atg act gac tec aag ggc aca gee act gat 4512 
Leu Lys Gin Tyr lie Phe Met Thr Asp Ser Lys Gly Thr Ala Thr Asp 
1490 1495 1500 

gtc tct ggt cca gtg gag ate aac act gca ate age cca gee aag gtc 4 560 
Val Ser Gly Pro Val Glu lie Asn Thr Ala lie Ser Pro Ala Lys Val 
1505 1510 1515 

caa ate att gtc aag get ggt ggc aag gag caa acc ttc aca get gac 4608 
Gin lie He Val Lys Ala Gly Gly Lys Glu Gin Thr Phe Thr Ala Asp 
1520 1525 1530 1535 

aag gat gtc tec ate cag cca age cca tec ttc gat gag atg aac tac 4 656 
Lys Asp Val Ser He Gin Pro Ser Pro Ser Phe Asp Glu Met Asn Tyr 
1540 1545 1550 

caa ttc aac get ctt gag att gat ggt tct ggc etc aac ttc ate aac 4704 
Gin Phe Asn Ala Leu Glu lie Asp Gly Ser Gly Leu Asn Phe He Asn 
1555 1560 1565 

aac tct get tee att gat gtc acc ttc act gee ttc get gag gat ggc 4752 
Asn Ser Ala Ser He Asp Val Thr Phe Thr Ala Phe Ala Glu Asp Gly 
1570 1575 1580 

cgc aag ttg ggt tac gag age ttc tec ate cca gtc acc ctt aag gtt 4800 
Arg Lys Leu Gly Tyr Glu Ser Phe Ser He Pro Val Thr Leu Lys Val 
1585 1590 1595 

tec act gac aac gca etc acc ctt cat cac aac gag aac ggt get cag 4 848 
Ser Thr Asp Asn Ala Leu Thr Leu His His Asn Glu Asn Gly Ala Gin 
1600 1605 1610 1615 

tac atg caa tgg caa age tac cgc ace agg ttg aac ace etc ttc gca 4896 
Tyr Met Gin Trp Gin Ser Tyr Arg Thr Arg Leu Asn Thr Leu Phe Ala 
1620 1625 1630 

agg caa ctt gtg gee cgt gee acc aca ggc att gac acc ate etc age 4 94 4 
Arg Gin Leu Val Ala Arg Ala Thr Thr Gly He Asp Thr He Leu Ser 
1635 1640 1645 
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atg gag acc cag aac ate caa gag cca cag ttg ggc aag ggt ttc tac 4 992 
Met Glu Thr Gin Asn lie Gin Glu Pro Gin Leu Gly Lys Gly Phe Tyr 
1650 1655 1660 

gcc acc ttc gtc ate cca cct tac aac etc age act cat ggt gat gag 5040 
Ala Thr Phe Val lie Pro Pro Tyr Asn Leu Ser Thr His Gly Asp Glu 
1665 1670 1675 

agg tgg ttc aag etc tac ate aag cac gtg gtt gac aac aac tec eac 5088 
Arg Trp Phe Lys Leu Tyr lie Lys His Val Val Asp Asn Asn Ser His 
1680 1685 1690 1695 

ate ate tac tct ggt caa etc act gac acc aac ate aac ate ace etc 5136 
lie lie Tyr Ser Gly Gin Leu Thr Asp Thr Asn lie Asn lie Thr Leu 
1700 1705 1710 

ttc ate cca ctt gac gat gtc cca etc aac cag gac tac cat gcc aag 518 4 
Phe lie Pro Leu Asp Asp Val Pro Leu Asn Gin Asp Tyr His Ala Lys 
1715 1720 1725 

gtc tac atg acc ttc aag aag tct cca tct gat ggc acc tgg tgg ggt 5232 
Val Tyr Met Thr Phe Lys Lys Ser Pro Ser Asp Gly Thr Trp Trp Gly 
1730 1735 1740 

cca cac ttc gtc cgt gat gac aag ggc ate gtc acc ate aac cca aag 5280 
Pro His Phe Val Arg Asp Asp Lys Gly lie Val Thr lie Asn Pro Lys 
1745 1750 1755 

tec ate etc ace eac ttc gag tct gtc aac gtt etc aac aac ate tec 5328 
Ser lie Leu Thr His Phe Glu Ser Val Asn Val Leu Asn Asn lie Ser 
1760 1765 1770 1775 

tct gag cca atg gac ttc tct ggt gee aac tec etc tac ttc tgg gag 537 6 
Ser Glu Pro Met Asp Phe Ser Gly Ala Asn Ser Leu Tyr Phe Trp Glu 
1780 1785 1790 

ttg ttc tac tac aca cca atg ctt gtg get caa agg ttg etc cat gag 5424 
Leu Phe Tyr Tyr Thr Pro Met Leu Val Ala Gin Arg Leu Leu His Glu 
1795 1800 1805 

cag aac ttc gat gag gcc aac agg tgg etc aag tac gtc tgg age cca 5472 
Gin Asn Phe Asp Glu Ala Asn Arg Trp Leu Lys Tyr Val Trp Ser Pro 
1810 1815 1820 

tct ggt tac att gtg eat ggt caa ate cag aac tac caa tgg aac gtc 5520 
Ser Gly Tyr lie Val His Gly Gin lie Gin Asn Tyr Gin Trp Asn Val 
1825 1830 1835 

agg cca ttg ctt gag gac acc tec tgg aac tct gac cca ctt gac tct 5568 
Arg Pro Leu Leu Glu Asp Thr Ser Trp Asn Ser Asp Pro Leu Asp Ser 
1840 1845 1850 1855 

gtg gac cct gat get gtg get caa cat gac cca atg cac tac aag gtc 5616 
Val Asp Pro Asp Ala Val Ala Gin His Asp Pro Met His Tyr Lys Val 
1860 1865 1870 

tee ace ttc atg agg ace ttg gac etc ttg att gcc aga ggt gac cat 5664 
Ser Thr Phe Met Arg Thr Leu Asp Leu Leu lie Ala Arg Gly Asp His 
1875 1880 1885 
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get tac cgc caa ttg gag agg gac acc etc aac gag gca aag atg tgg 
Ala Tyr Arg Gin Leu Glu Arg Asp Thr Leu Asn Glu Ala Lys Met Trp 
1890 1895 1900 



5712 



tac atg caa get etc cae etc ttg ggt gac aag cca tac etc cea etc 
Tyr Met Gin Ala Leu His Leu Leu Gly Asp Lys Pro Tyr Leu Pro Leu 
1905 1910 1915 



5760 



age acc act tgg tec gac cca agg ttg gac cgt get get gac ate ace 
Ser Thr Thr Trp Ser Asp Pro Arg Leu Asp Arg Ala Ala Asp lie Thr 
1920 1925 1930 1935 



5808 



act eag aac get cat gac tct gee att gtt get etc agg eag aac ate 
Thr Gin Asn Ala His Asp Ser Ala lie Val Ala Leu Arg Gin Asn lie 
1940 1945 1950 



5856 



cca act cet get cea etc tec etc aga tct get aac acc etc act gac 
Pro Thr Pro Ala Pro Leu Ser Leu Arg Ser Ala Asn Thr Leu Thr Asp 
1955 1960 - 1965 



5904 



ttg ttc etc cca cag ate aac gag gtc atg atg aac tac tgg caa acc 

Leu Phe Leu Pro Gin lie Asn Glu Val Met Met Asn Tyr Trp Gin Thr 

1970 1975 1980 

ttg get caa agg gtc tac aac etc aga cae aac etc tec att gat ggt 

Leu Ala Gin Arg Val Tyr Asn Leu Arg His Asn Leu Ser lie Asp Gly 

1985 1990 1995 



5952 



6000 



caa cca etc tac etc cca ate tac gee aca cea get gac cea aag get 
Gin Pro Leu Tyr Leu Pro lie Tyr Ala Thr Pro Ala Asp Pro Lys Ala 
2000 2005 2010 2015 



6048 



ett etc tct get get gtg get ace age caa ggt ggt gge aag etc cea 
Leu Leu Ser Ala Ala Val Ala Thr Ser Gin Gly Gly Gly Lys Leu Pro 
2020 2025 2030 



6096 



gag tec ttc atg tee etc tgg agg ttc cea cae atg ttg gag aac gee 
Glu Ser Phe Met Ser Leu Trp Arg Phe Pro His Met Leu Glu Asn Ala 
2035 2040 2045 



6144 



cgt gge atg gtc tec caa etc acc cag ttc ggt tec acc etc eag aac 
Arg Gly Met Val Ser Gin Leu Thr Gin Phe Gly Ser Thr Leu Gin Asn 
2050 2055 2060 



6192 



ate att gag agg caa gat get gag get etc aac get ttg etc cag aac 
lie lie Glu Arg Gin Asp Ala Glu Ala Leu Asn Ala Leu Leu Gin Asn 
2065 2070 2075 



6240 



eag gca get gag ttg ate etc ace aac ttg tec ate caa gac aag ace 
Gin Ala Ala Glu Leu He Leu Thr Asn Leu Ser He Gin Asp Lys Thr 
2080 2085 2090 2095 



6288 



att gag gag ett gat get gag aag aca gtc ett gag aag age aag get 
He Glu Glu Leu Asp Ala Glu Lys Thr Val Leu Glu Lys Ser Lys Ala 
2100 2105 2110 



6336 



ggt gee caa tct cgc ttc gac tec tac gge aag etc tac gat gag aac 
Gly Ala Gin Ser Arg Phe Asp Ser Tyr Gly Lys Leu Tyr Asp Glu Asn 
2115 2120 2125 



6384 



ate aac get ggt gag aac cag gee atg acc etc agg get tec gca get 



6432 
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lie Asn Ala Gly Glu Asn Gin Ala Met Thr Leu Arg Ala Ser Ala Ala 

2130 2135 2140 

ggt etc acc act get gtc caa gcc tct cgc ttg get ggt gca get get 

Gly Leu Thr Thr Ala Val Gin Ala Ser Arg Leu Ala Gly Ala Ala Ala 

2145 2150 2155 



6480 



gac etc gtt cea aac ate ttc ggt ttc get ggt ggt ggc tec aga tgg 

Asp Leu Val Pro Asn He Phe Gly Phe Ala Gly Gly Gly Ser Arg Trp 

2160 2165 2170 2175 

ggt gcc att get gag get ace ggt tac gtc atg gag ttc tct gcc aac 

Gly Ala He Ala Glu Ala Thr Gly Tyr Val Met Glu Phe Ser Ala Asn 

2180 2185 2190 



6528 



6576 



gtc atg aac act gag get gac aag ate age caa tct gag acc tac aga 
Val Met Asn Thr Glu Ala Asp Lys He Ser Gin Ser Glu Thr Tyr Arg 
2195 2200 2205 

agg cgc egt caa gag tgg gag ate caa agg aac aac get gag gca gag 
Arg Arg Arg Gin Glu Trp Glu He Gin Arg Asn Asn Ala Glu Ala Glu 
2210 2215 2220 

ttg aag caa ate gat get caa etc aag tec ttg get gtc aga agg gag 
Leu Lys Gin He Asp Ala Gin Leu Lys Ser Leu Ala Val Arg Arg Glu 
2225 2230 2235 

get get gtc etc cag aag acc tec etc aag acc caa cag gag caa acc 
Ala Ala Val Leu Gin Lys Thr Ser Leu Lys Thr Gin Gin Glu Gin Thr 
2240 2245 2250 2255 

cag tec cag ttg get ttc etc caa agg aag ttc tec aac cag get etc 
Gin Ser Gin Leu Ala Phe Leu Gin Arg Lys Phe Ser Asn Gin Ala Leu 
2260 2265 2270 

tac aac tgg etc aga ggc cgc ttg get gee ate tac ttc caa ttc tac 
Tyr Asn Trp Leu Arg Gly Arg Leu Ala Ala He Tyr Phe Gin Phe Tyr 
2275 2280 2285 

gac ett get gtg gee agg tgc etc atg get gag caa gcc tac cgc tgg 
Asp Leu Ala Val Ala Arg Cys Leu Met Ala Glu Gin Ala Tyr Arg Trp 
2290 2295 2300 

gag ttg aac gat gac tec gcc agg ttc ate aag cea ggt get tgg caa 
Glu Leu Asn Asp Asp Ser Ala Arg Phe He Lys Pro Gly Ala Trp Gin 
2305 2310 2315 

ggc ace tac get ggt etc ett get ggt gag acc etc atg etc tee ttg 
Gly Thr Tyr Ala Gly Leu Leu Ala Gly Glu Thr Leu Met Leu Ser Leu 
2320 2325 2330 2335 

get caa atg gag gat get cae 'etc aag agg gac aag agg get ttg gag 
Ala Gin Met Glu Asp Ala His Leu Lys Arg Asp Lys Arg Ala Leu Glu 
2340 2345 2350 

gtg gag agg aea gtc tec ett get gag gtc tac get ggt etc cea aag 
Val Glu Arg Thr Val Ser Leu Ala Glu Val Tyr Ala Gly Leu Pro Lys 
2355 2360 2365 

gac aac ggt cea ttc tec ett get caa gag att gac aag ttg gtc age 
Asp Asn Gly Pro Phe Ser Leu Ala Gin Glu He Asp Lys Leu Val Ser 
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2370 2375 2380 

caa ggt tct ggt tct get ggt tct ggt aac aac aac ttg get ttc ggc 7200 
Gin Gly Ser Gly Ser Ala Gly Ser Gly Asn Asn Asn Leu Ala Phe Gly 
2385 2390 2395 

get ggt act gac acc aag acc tec etc caa gee tct gtc tec ttc get 7248 
Ala Gly Thr Asp Thr Lys Thr Ser Leu Gin Ala Ser Val Ser Phe Ala 
2400 2405 2410 2415 

gae etc aag ate agg gag gac tae eca get tec ctt ggc aag ate agg 7296 
Asp Leu Lys lie Arg Glu Asp Tyr Pro Ala Ser Leu Gly Lys lie Arg 
2420 2425 2430 

egc ate aag caa ate tct gtc ace etc eca get etc ttg ggt eca tae 734 4 
Arg lie Lys Gin lie Ser Val Thr Leu Pro Ala Leu Leu Gly Pro Tyr 
2435 2440 2445 

caa gat gtc caa gca ate etc tec tae ggt gac aag get ggt ttg gcg 7392 
Gin Asp Val Gin Ala lie Leu Ser Tyr Gly Asp Lys Ala Gly Leu Ala 
2450 2455 2460 

aac ggt tge gag get ctt get gtc tct cat ggc atg aac gac tct ggt 7440 
Asn Gly Cys Glu Ala Leu Ala Val Ser His Gly Met Asn Asp Ser Gly 
2465 . 2470 2475 

caa ttc caa ctt gae ttc aac gat ggc aag ttc etc eca ttc gag ggc 7488 
Gin Phe Gin Leu Asp Phe Asn Asp Gly Lys Phe Leu Pro Phe Glu Gly 
2480 2485 2490 2495 

att gcc att gac caa ggc acc etc ace etc tec ttc eca aac get tee 7536 
lie Ala lie Asp Gin Gly Thr Leu Thr Leu Ser Phe Pro Asn Ala Ser 
2500 2505 2510 

atg eca gag aag gga aag caa gee acc atg etc aag ace etc aac gat 7584 
Met Pro Glu Lys Gly Lys Gin Ala Thr Met Leu Lys Thr Leu Asn Asp 
2515 2520 2525 

ate ate etc cac ate agg tac acc ate aag tgagcte 7621 
lie lie Leu His lie Arg Tyr Thr lie Lys 
2530 2535 
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